Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning Lucene Codec Block sizes to Optimize Performance #12029

Conversation

sarthakaggarwal97
Copy link
Contributor

@sarthakaggarwal97 sarthakaggarwal97 commented Jan 26, 2024

Description

Adds support to configure chunkSize aka Block Size for the default and best_compression codecs.

Related Issues

Resolves #7475, #3769

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added benchmarking Issues related to benchmarking or performance. enhancement Enhancement or improvement to existing feature or request Indexing:Performance Performance This is for any performance related enhancements or bugs labels Jan 26, 2024
@sarthakaggarwal97 sarthakaggarwal97 force-pushed the configurable-block-size-lucene-codecs branch from 21f71af to 8908ce8 Compare January 26, 2024 07:22
Copy link
Contributor

github-actions bot commented Jan 26, 2024

Compatibility status:

Checks if related components are compatible with change a3fdc2a

Incompatible components

Incompatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git]

Copy link
Contributor

❌ Gradle check result for 21f71af: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sarthakaggarwal97 sarthakaggarwal97 changed the title Support to Increase Block Size to 16k Tuning Lucene Codec Block sizes to Optimize Performance Jan 26, 2024
@sarthakaggarwal97 sarthakaggarwal97 force-pushed the configurable-block-size-lucene-codecs branch 2 times, most recently from 26bbf9e to 3d3a133 Compare January 26, 2024 07:32
Copy link
Contributor

❌ Gradle check result for 26bbf9e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 8908ce8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 3d3a133: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sarthakaggarwal97 sarthakaggarwal97 force-pushed the configurable-block-size-lucene-codecs branch from 3d3a133 to f9013a2 Compare January 26, 2024 11:19
Copy link
Contributor

❌ Gradle check result for f9013a2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sarthakaggarwal97 sarthakaggarwal97 force-pushed the configurable-block-size-lucene-codecs branch from f9013a2 to c2a617f Compare January 26, 2024 12:08
Copy link
Contributor

❌ Gradle check result for c2a617f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sarthakaggarwal97 sarthakaggarwal97 force-pushed the configurable-block-size-lucene-codecs branch from c2a617f to 9dbf157 Compare January 26, 2024 13:47
Copy link
Contributor

❌ Gradle check result for 9dbf157: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sarthak Aggarwal <[email protected]>
Copy link
Contributor

❌ Gradle check result for 5b40132: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sarthakaggarwal97 sarthakaggarwal97 force-pushed the configurable-block-size-lucene-codecs branch from 4226705 to a3fdc2a Compare January 26, 2024 15:34
Copy link
Contributor

❌ Gradle check result for 4226705: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for a3fdc2a: SUCCESS

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 71.39%. Comparing base (e017a9c) to head (a3fdc2a).
Report is 274 commits behind head on main.

Files Patch % Lines
...ch/index/codec/Lucene99CoreStoredFieldsFormat.java 68.00% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12029      +/-   ##
============================================
+ Coverage     71.36%   71.39%   +0.02%     
- Complexity    59468    59514      +46     
============================================
  Files          4925     4927       +2     
  Lines        279458   279503      +45     
  Branches      40631    40636       +5     
============================================
+ Hits         199446   199557     +111     
+ Misses        63437    63333     -104     
- Partials      16575    16613      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sarthakaggarwal97 sarthakaggarwal97 marked this pull request as ready for review January 28, 2024 06:56
public static final CompressionMode BEST_COMPRESSION_MODE = new DeflateWithPresetDictCompressionMode();

// Shoot for 10 sub blocks of 8kB each.
private static final int BEST_SPEED_BLOCK_LENGTH = 10 * 16 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says 8kB while the value is 16kB.

What is the effect of this change w.r.t i) disk size, ii) indexing performance and iii) search performance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what is the ongoing maintenance burden of this change? It looks like we'll be on the hook to implement new codec wrappers on every Lucene minor version. And is it plausible that future version of Lucene codecs might not expose the configurability we rely on here and then leave us in a bad spot regarding compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks folks for taking a look. This PR should have ideally been in draft stage ~ have moved it to draft.
Thanks for bringing these points up, I'm working on the fresh benchmarks, and correctness as of now.
Will get back on these questions in some time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we'll be on the hook to implement new codec wrappers on every Lucene minor version.

Yes, we will be maintaining new codec wrappers. Let me think if there is another way to get around this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we let customers control the page size? For example, MySQL lets customers configure the page size (https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_page_size)

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we think through the maintenance and compatibility implications of a change like this before we merge it as a fully supported option.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Mar 5, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Apr 8, 2024
@stephen-crawford
Copy link
Contributor

Hi @sarthakaggarwal97 are you still planning on moving forward with this change? I see you created an associated issue (great!) but am not sure what the current status of this is.

@sarthakaggarwal97
Copy link
Contributor Author

@scrawfor99 thanks for reaching out.

I have documented my findings and experiments over here for reference: #3769 (comment)

8K Block size provides the best performance in terms of search latency over stored fields, and for users who are concerned about storage, OpenSearch supports Zstandard via custom-codecs plugin which gives nice trade offs between storage and latency.

I feel there is not enough merit to make this change as it will introduce additional maintenance effort for the community upon every lucene version upgrade.

I will close this PR for now. If the community feels that users can benefit from exposing this option, I am happy to take this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking Issues related to benchmarking or performance. enhancement Enhancement or improvement to existing feature or request Indexing:Performance Performance This is for any performance related enhancements or bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tuning of chunkSize for LZ4 Compression Algorithm to gain performance in Indexing
5 participants