-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fix HNSW resize conditions to exclude deleted items #3424
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Going to merge since Sanket has already looked over it all. |
chroma-droid
pushed a commit
that referenced
this pull request
Jan 7, 2025
## Description of changes We were seeing errors in compaction that looked like: ``` WriteSegments(ApplyMaterializatedLogsError(HnswIndex(FFIException(\"The number of elements exceeds the specified limit\")))) ``` This is because the resize logic uses `len` which removes deleted elements which then causes the conditional for resizing to be skipped. This adds a new method `len_with_deleted` to return the length of the index including the elements that have been deleted so that the resize is not skipped. See: https://github.com/chroma-core/chroma/blob/27564afa0e82d184a0fa853a7384d00861d8e9dc/rust/worker/src/segment/distributed_hnsw_segment.rs#L212-L222 ## Test plan A test was added. It will fail when using the existing `len` method, but passes when using the newly added `len_with_deleted` method. - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes N/A Thanks to @sanketkedia for helping root cause this. Fixes chroma-core/hosted-chroma#855
chroma-droid
pushed a commit
that referenced
this pull request
Jan 7, 2025
## Description of changes We were seeing errors in compaction that looked like: ``` WriteSegments(ApplyMaterializatedLogsError(HnswIndex(FFIException(\"The number of elements exceeds the specified limit\")))) ``` This is because the resize logic uses `len` which removes deleted elements which then causes the conditional for resizing to be skipped. This adds a new method `len_with_deleted` to return the length of the index including the elements that have been deleted so that the resize is not skipped. See: https://github.com/chroma-core/chroma/blob/27564afa0e82d184a0fa853a7384d00861d8e9dc/rust/worker/src/segment/distributed_hnsw_segment.rs#L212-L222 ## Test plan A test was added. It will fail when using the existing `len` method, but passes when using the newly added `len_with_deleted` method. - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes N/A Thanks to @sanketkedia for helping root cause this. Fixes chroma-core/hosted-chroma#855
eculver
added a commit
that referenced
this pull request
Jan 7, 2025
This PR cherry-picks the commit 6c614ec onto release/2025-01-03. If there are unresolved conflicts, please resolve them manually. Co-authored-by: Evan Culver <[email protected]>
eculver
added a commit
that referenced
this pull request
Jan 8, 2025
## Description of changes This is a follow-up from #3424 - where I had added code to fix the bug in compaction but then forgot to replace the method at the actual callsite with the new method 🤦🏼 . This updates the problem code to use `len_with_deleted` instead of just `len` so that the resize actually gets triggered. Some additional context: we decided not to change the behavior of the existing `len` method in case other callers were depending on existing behavior. In other words, we didn't want to introduce accidentally introduce bugs that were relying on the "buggy" code.
chroma-droid
pushed a commit
that referenced
this pull request
Jan 8, 2025
## Description of changes This is a follow-up from #3424 - where I had added code to fix the bug in compaction but then forgot to replace the method at the actual callsite with the new method 🤦🏼 . This updates the problem code to use `len_with_deleted` instead of just `len` so that the resize actually gets triggered. Some additional context: we decided not to change the behavior of the existing `len` method in case other callers were depending on existing behavior. In other words, we didn't want to introduce accidentally introduce bugs that were relying on the "buggy" code.
chroma-droid
pushed a commit
that referenced
this pull request
Jan 8, 2025
## Description of changes This is a follow-up from #3424 - where I had added code to fix the bug in compaction but then forgot to replace the method at the actual callsite with the new method 🤦🏼 . This updates the problem code to use `len_with_deleted` instead of just `len` so that the resize actually gets triggered. Some additional context: we decided not to change the behavior of the existing `len` method in case other callers were depending on existing behavior. In other words, we didn't want to introduce accidentally introduce bugs that were relying on the "buggy" code.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes
We were seeing errors in compaction that looked like:
This is because the resize logic uses
len
which removes deleted elements which then causes the conditional for resizing to be skipped. This adds a new methodlen_with_deleted
to return the length of the index including the elements that have been deleted so that the resize is not skipped. See:chroma/rust/worker/src/segment/distributed_hnsw_segment.rs
Lines 212 to 222 in 27564af
Test plan
A test was added. It will fail when using the existing
len
method, but passes when using the newly addedlen_with_deleted
method.pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
N/A
Thanks to @sanketkedia for helping root cause this.
Fixes https://github.com/chroma-core/hosted-chroma/issues/855