-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream read pool and default s3 timeouts tuning #10912
Stream read pool and default s3 timeouts tuning #10912
Conversation
8ce5fde
to
f0a373e
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Compatibility status:Checks if related components are compatible with change 6ed696f Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git] |
f0a373e
to
17013ad
Compare
Gradle Check (Jenkins) Run Completed with:
|
plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3ClientSettings.java
Outdated
Show resolved
Hide resolved
plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3RepositoryPlugin.java
Outdated
Show resolved
Hide resolved
Also, do we have any perf results showing how things have improved? |
This is not about perf. This is to handle upload bursts occurring due to large file uploads. Increasing timeouts is a temporary change till we use a separate dedicated slow client which can handle special cases by putting them behind a producer-consumer queue and processing them slowly without impacting critical uploads. For large uploads happening on a smaller instance, this timeout may still not be sufficient. |
❌ Gradle check result for 3ba52e3: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: vikasvb90 <[email protected]>
…ic collector in async flow Signed-off-by: vikasvb90 <[email protected]>
3ba52e3
to
6ed696f
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #10912 +/- ##
============================================
+ Coverage 71.21% 71.36% +0.14%
- Complexity 58926 59002 +76
============================================
Files 4890 4890
Lines 277434 277447 +13
Branches 40313 40313
============================================
+ Hits 197567 197989 +422
+ Misses 63464 62992 -472
- Partials 16403 16466 +63 ☔ View full report in Codecov by Sentry. |
Signed-off-by: vikasvb90 <[email protected]> (cherry picked from commit 74b2d7d) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 74b2d7d) Signed-off-by: vikasvb90 <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@@ -198,14 +198,14 @@ final class S3ClientSettings { | |||
static final Setting.AffixSetting<Integer> MAX_CONNECTIONS_SETTING = Setting.affixKeySetting( | |||
PREFIX, | |||
"max_connections", | |||
key -> Setting.intSetting(key, 100, Property.NodeScope) | |||
key -> Setting.intSetting(key, 500, Property.NodeScope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel these should made a function of cores. For eg: instances with fewer cores will run out of higher read timeouts with too many connections since we have limited threads to process
cc: @vikasvb90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be but we will have to first fix the blocking reads from disk happening in stream reader pool and then tune all timeouts as well as this connection count based on benchmarks.
…0912) Signed-off-by: vikasvb90 <[email protected]>
…0912) Signed-off-by: vikasvb90 <[email protected]>
…0912) Signed-off-by: vikasvb90 <[email protected]>
…0912) Signed-off-by: vikasvb90 <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
Description
On upload of large shards (large segments or more number of smaller segments), we noticed s3 errors. For large segments, api timeouts were observed and for large shards of small segments connection acquisition timeout was observed as below. Also, for small instances having low cpu cores, stream read pool capacity becomes low which processes upload requests slowly.
This change is to increase the capacity of stream read pool and increase connection acquisition and api timeouts.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
New functionality includes testing.All tests passNew functionality has been documented.New functionality has javadoc addedFailing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)Commit changes are listed out in CHANGELOG.md file (See: Changelog)Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.