Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix graph merge stats size calculation #1844

Merged
merged 9 commits into from
Aug 8, 2024

Conversation

ryanbogan
Copy link
Member

@ryanbogan ryanbogan commented Jul 17, 2024

Description

Fixes the calculations for size of merges in the graph stats section of KNNStats API. This PR changes the logic to properly round values to the correct number of bytes.

Continuation of #1818, which had too many merge conflicts to fix cleanly.

Issues Resolved

#1789

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ryan Bogan <[email protected]>
@ryanbogan
Copy link
Member Author

BWC failures are unrelated to this PR

luyuncheng
luyuncheng previously approved these changes Jul 18, 2024
Copy link
Collaborator

@luyuncheng luyuncheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Ryan Bogan <[email protected]>
@ryanbogan ryanbogan requested a review from heemin32 July 18, 2024 19:14
@ryanbogan ryanbogan added v2.17.0 and removed v2.16.0 labels Jul 23, 2024
heemin32
heemin32 previously approved these changes Jul 26, 2024
vectorsSize += vectorsSize % JAVA_ROUNDING_NUMBER;
}
return vectorsSize;
if (serializationMode == SerializationMode.COLLECTIONS_OF_BYTES) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rid of this serializationMode attribute completely?

Copy link
Member Author

@ryanbogan ryanbogan Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to calculate array size from a Pair class typically. The issue is that the KNNCodecUtil.Pair class only has doc id's, a vector address, dimension, and serialization mode as instance variables. Therefore, without reading memory from the vector address I don't think it's possible to differentiate whether the data is floats or bytes without the serialization mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use vector datatype to know if the vector is byte or float?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that would work, binary type would be the same calculation as byte right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes same thing

CHANGELOG.md Outdated Show resolved Hide resolved
@ryanbogan ryanbogan merged commit e3158f9 into opensearch-project:main Aug 8, 2024
27 of 28 checks passed
@ryanbogan ryanbogan deleted the graph_size_bug_v2 branch August 8, 2024 00:18
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1844-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e3158f990d058b02568da617688fd4857d0d521b
# Push it to GitHub
git push --set-upstream origin backport/backport-1844-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1844-to-2.x.

ryanbogan added a commit that referenced this pull request Aug 8, 2024
* Fix graph merge stats size calculation

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Make calculations easier to read

Signed-off-by: Ryan Bogan <[email protected]>

* Remove java overhead from calculations

Signed-off-by: Ryan Bogan <[email protected]>

* Change from serialization mode to vector data type for calculations

Signed-off-by: Ryan Bogan <[email protected]>

* Minor change to if statements

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit e3158f9)
ryanbogan added a commit that referenced this pull request Aug 9, 2024
* Fix graph merge stats size calculation

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Make calculations easier to read

Signed-off-by: Ryan Bogan <[email protected]>

* Remove java overhead from calculations

Signed-off-by: Ryan Bogan <[email protected]>

* Change from serialization mode to vector data type for calculations

Signed-off-by: Ryan Bogan <[email protected]>

* Minor change to if statements

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit e3158f9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants