-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix graph merge stats size calculation #1844
Fix graph merge stats size calculation #1844
Conversation
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
BWC failures are unrelated to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Ryan Bogan <[email protected]>
src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
vectorsSize += vectorsSize % JAVA_ROUNDING_NUMBER; | ||
} | ||
return vectorsSize; | ||
if (serializationMode == SerializationMode.COLLECTIONS_OF_BYTES) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get rid of this serializationMode
attribute completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used to calculate array size from a Pair class typically. The issue is that the KNNCodecUtil.Pair class only has doc id's, a vector address, dimension, and serialization mode as instance variables. Therefore, without reading memory from the vector address I don't think it's possible to differentiate whether the data is floats or bytes without the serialization mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use vector datatype to know if the vector is byte or float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that would work, binary type would be the same calculation as byte right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes same thing
Signed-off-by: Ryan Bogan <[email protected]>
src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Ryan Bogan <[email protected]>
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1844-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e3158f990d058b02568da617688fd4857d0d521b
# Push it to GitHub
git push --set-upstream origin backport/backport-1844-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
* Fix graph merge stats size calculation Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Make calculations easier to read Signed-off-by: Ryan Bogan <[email protected]> * Remove java overhead from calculations Signed-off-by: Ryan Bogan <[email protected]> * Change from serialization mode to vector data type for calculations Signed-off-by: Ryan Bogan <[email protected]> * Minor change to if statements Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit e3158f9)
* Fix graph merge stats size calculation Signed-off-by: Ryan Bogan <[email protected]> * Add changelog entry Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Make calculations easier to read Signed-off-by: Ryan Bogan <[email protected]> * Remove java overhead from calculations Signed-off-by: Ryan Bogan <[email protected]> * Change from serialization mode to vector data type for calculations Signed-off-by: Ryan Bogan <[email protected]> * Minor change to if statements Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit e3158f9)
Description
Fixes the calculations for size of merges in the graph stats section of KNNStats API. This PR changes the logic to properly round values to the correct number of bytes.
Continuation of #1818, which had too many merge conflicts to fix cleanly.
Issues Resolved
#1789
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.