Add CompressionLevel Calculation for PQ #2200

jmazanec15 · 2024-10-09T21:08:59Z

Description

Currently, for product quantization, we set the calculated compression level to NOT_CONFIGURED. The main issue with this is that if a user sets up a disk-based index with PQ, no re-scoring will happen by default.

This change adds the calculation so that the proper re-scoring will happen. The formula is fairly straightforward =>
actual compression = (d * 32) / (m * code_size). Then, we round to the neareste compression level (because we only support discrete compression levels).

One small issue with this is that if PQ is configured to have compression > 32x, the value will be 32x. Functionally, the only issue will be that we may not be as aggressive on oversampling for on disk mode.

Check List

New functionality includes testing.
Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v · 2024-10-10T20:38:00Z

One small issue with this is that if PQ is configured to have compression > 32x, the value will be 32x. Functionally, the only issue will be that we may not be as aggressive on oversampling for on disk mode.

should we allow more compression level?

navneet1v · 2024-10-10T22:58:31Z

src/main/java/org/opensearch/knn/index/mapper/CompressionLevel.java

@@ -29,6 +29,8 @@ public enum CompressionLevel {
    x16(16, "16x", new RescoreContext(3.0f, false), Set.of(Mode.ON_DISK)),
    x32(32, "32x", new RescoreContext(3.0f, false), Set.of(Mode.ON_DISK));

+    public static final CompressionLevel MAX_COMPRESSION_LEVEL = CompressionLevel.x32;


should we have 64 as a max compression level? I don't have a solid point to have 64 but I think have 1 more extra compression is always good.

I think that makes sense. I guess for 64x, default for all dimensions should probably be 5x.

navneet1v

overall looks good to me. Just 1 minor comment.

heemin32

LGTM. Do we want to add integ test for it? Recall should be higher with rescoring than baseline.

Currently, for product quantization, we set the calculated compression level to NOT_CONFIGURED. The main issue with this is that if a user sets up a disk-based index with PQ, no re-scoring will happen by default. This change adds the calculation so that the proper re-scoring will happen. The formula is fairly straightforward => actual compression = (d * 32) / (m * code_size). Then, we round to the neareste compression level (because we only support discrete compression levels). One small issue with this is that if PQ is configured to have compression > 32x, the value will be 32x. Functionally, the only issue will be that we may not be as aggressive on oversampling for on disk mode. Signed-off-by: John Mazanec <[email protected]>

Signed-off-by: John Mazanec <[email protected]>

Currently, for product quantization, we set the calculated compression level to NOT_CONFIGURED. The main issue with this is that if a user sets up a disk-based index with PQ, no re-scoring will happen by default. This change adds the calculation so that the proper re-scoring will happen. The formula is fairly straightforward => actual compression = (d * 32) / (m * code_size). Then, we round to the neareste compression level (because we only support discrete compression levels). One small issue with this is that if PQ is configured to have compression > 64x, the value will be 64x. Functionally, the only issue will be that we may not be as aggressive on oversampling for on disk mode. Signed-off-by: John Mazanec <[email protected]> (cherry picked from commit 228aead)

navneet1v · 2024-11-26T19:26:23Z

@jmazanec15 did we do any benchmarks which suggest how much improvement we will get with PQ based rescoring?

jmazanec15 · 2024-11-26T20:40:33Z

@navneet1v we did here: #1779 (comment)

navneet1v · 2024-11-26T21:35:13Z

@navneet1v we did here: #1779 (comment)

Thanks. I completely forgot about that.

jmazanec15 added Enhancements Increases software capabilities beyond original client specifications backport 2.x labels Oct 9, 2024

jmazanec15 requested review from heemin32, navneet1v, VijayanB, vamshin, naveentatikonda, junqiu-lei, martin-gaievski, ryanbogan and luyuncheng as code owners October 9, 2024 21:09

jmazanec15 force-pushed the pq-compression-level-fix branch 4 times, most recently from f2d7c89 to 907f1ec Compare October 10, 2024 17:01

navneet1v reviewed Oct 10, 2024

View reviewed changes

heemin32 reviewed Oct 10, 2024

View reviewed changes

heemin32 previously approved these changes Oct 10, 2024

View reviewed changes

jmazanec15 mentioned this pull request Oct 15, 2024

Should we setup e2e tests separately from integ tests? #2208

Open

jmazanec15 added 2 commits October 15, 2024 09:45

Add 64x compression

78915d1

Signed-off-by: John Mazanec <[email protected]>

jmazanec15 dismissed heemin32’s stale review via 78915d1 October 15, 2024 16:55

jmazanec15 force-pushed the pq-compression-level-fix branch from 907f1ec to 78915d1 Compare October 15, 2024 16:55

jmazanec15 requested a review from shatejas as a code owner October 15, 2024 16:55

navneet1v approved these changes Oct 16, 2024

View reviewed changes

heemin32 approved these changes Oct 16, 2024

View reviewed changes

jmazanec15 merged commit 228aead into opensearch-project:main Oct 16, 2024
31 checks passed

opensearch-trigger-bot bot mentioned this pull request Oct 16, 2024

[Backport 2.x] Add CompressionLevel Calculation for PQ #2216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CompressionLevel Calculation for PQ #2200

Add CompressionLevel Calculation for PQ #2200

jmazanec15 commented Oct 9, 2024

navneet1v commented Oct 10, 2024

navneet1v Oct 10, 2024

jmazanec15 Oct 15, 2024

navneet1v left a comment

heemin32 left a comment

navneet1v commented Nov 26, 2024

jmazanec15 commented Nov 26, 2024

navneet1v commented Nov 26, 2024

Add CompressionLevel Calculation for PQ #2200

Add CompressionLevel Calculation for PQ #2200

Conversation

jmazanec15 commented Oct 9, 2024

Description

Check List

navneet1v commented Oct 10, 2024

navneet1v Oct 10, 2024

Choose a reason for hiding this comment

jmazanec15 Oct 15, 2024

Choose a reason for hiding this comment

navneet1v left a comment

Choose a reason for hiding this comment

heemin32 left a comment

Choose a reason for hiding this comment

navneet1v commented Nov 26, 2024

jmazanec15 commented Nov 26, 2024

navneet1v commented Nov 26, 2024