-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Create a separate thread pool for Segment Replication events #8669
[Segment Replication] Create a separate thread pool for Segment Replication events #8669
Conversation
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #8669 +/- ##
============================================
- Coverage 70.95% 70.85% -0.11%
+ Complexity 57056 57030 -26
============================================
Files 4759 4761 +2
Lines 269691 269723 +32
Branches 39454 39455 +1
============================================
- Hits 191359 191109 -250
- Misses 62184 62535 +351
+ Partials 16148 16079 -69 ☔ View full report in Codecov by Sentry. |
@@ -272,7 +272,7 @@ public void onFailure(Exception e) { | |||
|
|||
@Override | |||
protected String getThreadPool() { | |||
return ThreadPool.Names.GENERIC; | |||
return ThreadPool.Names.SEGMENT_REPLICATION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component is used during ingest to determine if pressure should be applied, not to perform segrep activities. I don't think we should use the pool here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes makes sense, will update it
Signed-off-by: Rishikesh1159 <[email protected]>
Should we first test the difference with and without the custom ThreadPool? @mch2 / @Rishikesh1159 |
@kotwanikunal do you mean running some OS benchmarks with and without custom threadpool? |
Yes. |
Gradle Check (Jenkins) Run Completed with:
|
@@ -267,6 +270,10 @@ public ThreadPool( | |||
Names.REMOTE_REFRESH, | |||
new ScalingExecutorBuilder(Names.REMOTE_REFRESH, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5)) | |||
); | |||
builders.put( | |||
Names.SEGMENT_REPLICATION, | |||
new ScalingExecutorBuilder(Names.SEGMENT_REPLICATION, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we deciding on these values? Should we be trying out a few different values to see what works best with most common segment replication setups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I think bounding SEGMENT_REPLICATION
threadpool to max 10 threads is not correct and is problematic for cluster having high number of indices. I think with benchmark with multiple indices, problem will surface in segment replication stats (higher replication lag) and thread pool stats.
I think for segment replication, the thread queue should not be bounded or at least should be a fairly large value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rishikesh1159 : Identifying ideal maximum thread count is tricky and will depend on cluster load & usage. I think we need to set this to a number sufficient to handle the busiest of traffic/ingestion pattern without bottleneck on threads in pool. One way to get more insights on this number is to run an experiment
- Max thread count to a very high value (512?)
- Create X indices (start with 1 ?)
- Simulate traffic on all X indices
- Repeat 1-3 for higher values of X
The count of max/avg thread usage will give us idea on this number.
Please note, we need to have ingestion happening on all indices simulateneously to parallel rounds of segment replication and an increased threads usage.
@@ -267,6 +270,10 @@ public ThreadPool( | |||
Names.REMOTE_REFRESH, | |||
new ScalingExecutorBuilder(Names.REMOTE_REFRESH, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5)) | |||
); | |||
builders.put( | |||
Names.SEGMENT_REPLICATION, | |||
new ScalingExecutorBuilder(Names.SEGMENT_REPLICATION, 1, halfProcMaxAt10, TimeValue.timeValueMinutes(5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I think bounding SEGMENT_REPLICATION
threadpool to max 10 threads is not correct and is problematic for cluster having high number of indices. I think with benchmark with multiple indices, problem will surface in segment replication stats (higher replication lag) and thread pool stats.
I think for segment replication, the thread queue should not be bounded or at least should be a fairly large value
I think you need to update your local with latest
|
Thanks for review @kotwanikunal @Poojita-Raj @dreamer-89 @mch2, I tried OS benchmarks to see if there will be any increment/decrement in performance with using separate threadpool. Here are results: Using Segrep Threadpool.
Using GENERIC threadpool
Summary is there is around 20% increase in indexing latency for P100 metric. As @dreamer-89 and @Poojita-Raj suggested I feel like maximum of 10 threads for segrep threadpool might not be enough in cases of having high no.of replicas and indices in a single node. Capping at max of 10 threads we might see performance degradation. Usually GENERIC threadpool has around max of 512 threads (which is shared by many other tasks) might be better in some cases than having segrep threadpool. One thing we can do is increase the max no.of threads in segrep threadpool, but we exactly don't know what are the optimal no.of threads here, as it varies from workload being executed. We definitely need do to more benchmark runs with different workloads to gather some data before we decide on optimal max no.of threads for segrep threadpool. To be clear on main objective of creating a separate threadpool for segrep is for monitoring segrep related tasks and not for any performance gain. Please share if you guys have any other thoughts. |
I have done few other benchmark runs. Here are my observations. -> As I increase the number of shards to 20, replicas to 3 and nodes 4 I did see the degradation in index latency for P100 metric with separate segment replication threadpool was even worse when compared to before. So we can say that as shards and replica count increases the P100 metric of index latency becomes worse. Below are results:
-> I tried to increase the max number of threads in segrep threadpool to 20 and 50 and I don't see degradation in index latency of P100 anymore. Here are the results:
-> Next I tried to change the execution of Replication runner from GENERIC thread to segrep threadpool. I changed here and here. All 5 transport layer calls related to segment replication are also using segrep threadpool and max number of threads in the threadpool is set to only 10. Here are results:
Summary:As we increase the no.of shards and replicas in cluster the index latency for P100 metric gets worse if we use separate segrep threadpool (only transport layer calls) with max of 10 threads in threadpool. But if we increase the max no.of threads in separate segrep threadpool we don't see the index latency degradation. Also if we include replication runners on separate segrep threadpool then we don't see any degradation in index latency I will update this task as I find new findings from benchmark runs. Thanks @dreamer-89 for your suggestion. I will try few more rounds of benchmarks with the configuration you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to understanding how are these threads going to be used. Are they compute bound? If yes we might need to run experiments to decide on the precise size
This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days. |
This PR was closed because it has been stalled for 7 days with no activity. |
Apologies. This PR was auto closed without reaching a resolution from the maintainers. |
Compatibility status:Checks if related components are compatible with change f0a5b00 Incompatible componentsSkipped componentsCompatible componentsCompatible components: [] |
Gradle Check (Jenkins) Run Completed with:
|
This PR is stalled because it has been open for 30 days with no activity. |
Closing, this as it has stalled. |
Description
This PR creates a new threadpool for segment replication.
Related Issues
Resolves #8118
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.