Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Create a separate thread pool for Segment Replication events #8669

Closed

Conversation

Rishikesh1159
Copy link
Member

Description

This PR creates a new threadpool for segment replication.

Related Issues

Resolves #8118

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testStaleCommitDeletionWithoutInvokeFlush
      1 org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testStaleCommitDeletionWithInvokeFlush
      1 org.opensearch.remotestore.RemoteStoreIT.testStaleCommitDeletionWithInvokeFlush

@codecov
Copy link

codecov bot commented Jul 12, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (7dddb31) 70.95% compared to head (3433c26) 70.85%.
Report is 779 commits behind head on main.

❗ Current head 3433c26 differs from pull request most recent head f0a5b00. Consider uploading reports for the commit f0a5b00 to get more accurate results

Files Patch % Lines
...etry/tracing/exporter/OTelSpanExporterFactory.java 71.42% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8669      +/-   ##
============================================
- Coverage     70.95%   70.85%   -0.11%     
+ Complexity    57056    57030      -26     
============================================
  Files          4759     4761       +2     
  Lines        269691   269723      +32     
  Branches      39454    39455       +1     
============================================
- Hits         191359   191109     -250     
- Misses        62184    62535     +351     
+ Partials      16148    16079      -69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -272,7 +272,7 @@ public void onFailure(Exception e) {

@Override
protected String getThreadPool() {
return ThreadPool.Names.GENERIC;
return ThreadPool.Names.SEGMENT_REPLICATION;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This component is used during ingest to determine if pressure should be applied, not to perform segrep activities. I don't think we should use the pool here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes makes sense, will update it

Signed-off-by: Rishikesh1159 <[email protected]>
@kotwanikunal
Copy link
Member

Should we first test the difference with and without the custom ThreadPool? @mch2 / @Rishikesh1159

@Rishikesh1159
Copy link
Member Author

Should we first test the difference with and without the custom ThreadPool? @mch2 / @Rishikesh1159

@kotwanikunal do you mean running some OS benchmarks with and without custom threadpool?

@kotwanikunal
Copy link
Member

Should we first test the difference with and without the custom ThreadPool? @mch2 / @Rishikesh1159

@kotwanikunal do you mean running some OS benchmarks with and without custom threadpool?

Yes.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@@ -267,6 +270,10 @@ public ThreadPool(
Names.REMOTE_REFRESH,
new ScalingExecutorBuilder(Names.REMOTE_REFRESH, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5))
);
builders.put(
Names.SEGMENT_REPLICATION,
new ScalingExecutorBuilder(Names.SEGMENT_REPLICATION, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we deciding on these values? Should we be trying out a few different values to see what works best with most common segment replication setups?

Copy link
Member

@dreamer-89 dreamer-89 Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I think bounding SEGMENT_REPLICATION threadpool to max 10 threads is not correct and is problematic for cluster having high number of indices. I think with benchmark with multiple indices, problem will surface in segment replication stats (higher replication lag) and thread pool stats.

I think for segment replication, the thread queue should not be bounded or at least should be a fairly large value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rishikesh1159 : Identifying ideal maximum thread count is tricky and will depend on cluster load & usage. I think we need to set this to a number sufficient to handle the busiest of traffic/ingestion pattern without bottleneck on threads in pool. One way to get more insights on this number is to run an experiment

  1. Max thread count to a very high value (512?)
  2. Create X indices (start with 1 ?)
  3. Simulate traffic on all X indices
  4. Repeat 1-3 for higher values of X

The count of max/avg thread usage will give us idea on this number.
Please note, we need to have ingestion happening on all indices simulateneously to parallel rounds of segment replication and an increased threads usage.

@@ -267,6 +270,10 @@ public ThreadPool(
Names.REMOTE_REFRESH,
new ScalingExecutorBuilder(Names.REMOTE_REFRESH, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5))
);
builders.put(
Names.SEGMENT_REPLICATION,
new ScalingExecutorBuilder(Names.SEGMENT_REPLICATION, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5))
Copy link
Member

@dreamer-89 dreamer-89 Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I think bounding SEGMENT_REPLICATION threadpool to max 10 threads is not correct and is problematic for cluster having high number of indices. I think with benchmark with multiple indices, problem will surface in segment replication stats (higher replication lag) and thread pool stats.

I think for segment replication, the thread queue should not be bounded or at least should be a fairly large value

@dreamer-89
Copy link
Member

Gradle Check (Jenkins) Run Completed with:

I think you need to update your local with latest main which identifies 2.10.0 as new bwc minor version.

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':distribution:bwc:minor:buildBwcLinuxTar'.
> Building 2.9.0 didn't generate expected file /var/jenkins/workspace/gradle-check/search/distribution/bwc/minor/build/bwc/checkout-2.x/distribution/archives/linux-tar/build/distributions/opensearch-min-2.9.0-SNAPSHOT-linux-x64.tar.gz

@Rishikesh1159
Copy link
Member Author

Rishikesh1159 commented Jul 19, 2023

Thanks for review @kotwanikunal @Poojita-Raj @dreamer-89 @mch2, I tried OS benchmarks to see if there will be any increment/decrement in performance with using separate threadpool. Here are results:

Using Segrep Threadpool.

Operation Index Latency Index Throughput index-stats-wait-until-merges-finish latency
       
P0 - 29170 -
P50 1205 30973 104,121
P90 1751   833,121
P99 2550   833,121
P100 4239 35415 833,121

Using GENERIC threadpool

Operation Index Latency Index Throughput index-stats-wait-until-merges-finish latency
       
       
P0 - 29,768 -
P50 1222 31160 774,913
P90 1751   962,163
P99 2404   962,163
P100 3526 36,143 962,163

Summary is there is around 20% increase in indexing latency for P100 metric. As @dreamer-89 and @Poojita-Raj suggested I feel like maximum of 10 threads for segrep threadpool might not be enough in cases of having high no.of replicas and indices in a single node. Capping at max of 10 threads we might see performance degradation. Usually GENERIC threadpool has around max of 512 threads (which is shared by many other tasks) might be better in some cases than having segrep threadpool. One thing we can do is increase the max no.of threads in segrep threadpool, but we exactly don't know what are the optimal no.of threads here, as it varies from workload being executed. We definitely need do to more benchmark runs with different workloads to gather some data before we decide on optimal max no.of threads for segrep threadpool.

To be clear on main objective of creating a separate threadpool for segrep is for monitoring segrep related tasks and not for any performance gain.

Please share if you guys have any other thoughts.

@Rishikesh1159
Copy link
Member Author

Rishikesh1159 commented Jul 26, 2023

I have done few other benchmark runs. Here are my observations.

-> As I increase the number of shards to 20, replicas to 3 and nodes 4 I did see the degradation in index latency for P100 metric with separate segment replication threadpool was even worse when compared to before. So we can say that as shards and replica count increases the P100 metric of index latency becomes worse. Below are results:

  • Using separate segrep threadpool 20 shards, 3 replicas, 4 node (max of 10 threads in pool):
Operation Index Latency Index Throughput index-stats-wait-until-merges-finish latency
       
P0 - 33687 -
P50 1,080 35,677 56914
P90 1445   107,319
P99 2,029   107,319
P100 5,106 42,487 107,319
  • Using GENERIC threadpool (no separate segrep threadpool) 20 shards, 3 replicas, 4 node:
Operation Index Latency Index Throughput index-stats-wait-until-merges-finish latency
       
P0 - 34039 -
P50 1,071.6 36287 70617
P90 1411   93,227
P99 1944   93,227
P100 3,665.8 41927 93,227

-> I tried to increase the max number of threads in segrep threadpool to 20 and 50 and I don't see degradation in index latency of P100 anymore. Here are the results:

  • Using separate segrep threadpool 20 shards, 3 replicas, 4 node (max of 20 threads in pool):
Operation Index Latency Index Throughput index-stats-wait-until-merges-finish latency
       
P0 - 34,772 -
P50 1,075 36,125 60940
P90 1382   115,963
P99 1,979   115,963
P100 3668 41227 115,963
  • Using separate segrep threadpool 20 shards, 3 replicas, 4 node (max of 50 threads in pool):
Operation Index Latency Index Throughput index-stats-wait-until-merges-finish latency
       
P0 - 35,734 -
P50 1,043 36825 110498
P90 1349   154,722
P99 1,935   154,722
P100 3407 41,882 154,722

-> Next I tried to change the execution of Replication runner from GENERIC thread to segrep threadpool. I changed here and here. All 5 transport layer calls related to segment replication are also using segrep threadpool and max number of threads in the threadpool is set to only 10. Here are results:

  • Using separate segrep threadpool for transport calls and replication runner 20 shards, 3 replicas, 4 node (max of 10 threads in pool)
Operation Index Latency Index Throughput index-stats-wait-until-merges-finish latency
       
P0 - 34,966 -
P50 1,067 36,762 95531
P90 1,379   159,398
P99 1896   159,398
P100 3497 42115 159,398

Summary:

As we increase the no.of shards and replicas in cluster the index latency for P100 metric gets worse if we use separate segrep threadpool (only transport layer calls) with max of 10 threads in threadpool. But if we increase the max no.of threads in separate segrep threadpool we don't see the index latency degradation. Also if we include replication runners on separate segrep threadpool then we don't see any degradation in index latency

I will update this task as I find new findings from benchmark runs. Thanks @dreamer-89 for your suggestion. I will try few more rounds of benchmarks with the configuration you suggested.

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to understanding how are these threads going to be used. Are they compute bound? If yes we might need to run experiments to decide on the precise size

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Aug 26, 2023
@opensearch-trigger-bot
Copy link
Contributor

This PR was closed because it has been stalled for 7 days with no activity.

@kotwanikunal
Copy link
Member

Apologies. This PR was auto closed without reaching a resolution from the maintainers.
Re-opening to move it forward.
Thanks for your contributions to OpenSearch!

@kotwanikunal kotwanikunal reopened this Sep 14, 2023
@github-actions github-actions bot added distributed framework enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep labels Sep 14, 2023
@kotwanikunal kotwanikunal removed the stalled Issues that have stalled label Sep 14, 2023
@github-actions
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change f0a5b00

Incompatible components

Skipped components

Compatible components

Compatible components: []

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Jan 13, 2024
@mch2
Copy link
Member

mch2 commented Jan 29, 2024

Closing, this as it has stalled.

@mch2 mch2 closed this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Segment Replication] Create separate thread pool for Segrep
7 participants