[RFC] Dynamically enable concurrent search on search requests at runtime #14781

Gankris96 · 2024-07-17T00:01:28Z

Is your feature request related to a problem? Please describe

TL;DR

I am proposing a mechanism to increase the adoption of concurrent segment search by dynamically enabling it for user at runtime on a per request basis based on certain factors --- such as request type and resource availability; currently starting off with CPU utilization and with ability to add more deciding parameters in the future.

Background Overview

With the introduction of Concurrent Segment Search we have seen good improvement in search performance.
However, currently the feature is disabled by default and users have to go and enable the feature manually by enabling one of cluster level setting or index level setting. ref
This means that users need to take manual action to enable concurrent search for their workloads in OpenSearch.

However, based on the performance benchmarking done, we have the following observations -

For certain type of queries such as aggregation queries we generally see improvement in search latency
short running queries with latency in single digit ms, concurrent search doesn't show much improvement.
CPU utilization increases with concurrent search, and increases linearly with slice count

Given these observations, we can enable concurrent segment search whenever possible to efficiently utilize resource and get performance benefits.
Any overrides that user has set will be honored. This feature is catered more towards the cases where concurrent search feature has not been explicitly set by the user.

Describe the solution you'd like

At a high level, introduce an additional decision layer that runs as a separate component and monitors the metrics such that it can be retrieved as a simple in-memory lookup at runtime during a search request. This will be done with the tenets that dynamic decision making needs to be fast and not add additional latency to the search request.
Each search request will query this decision layer to get a decision of whether to execute the search request via concurrent search path or not. The decision making happens at a node level, which means each shard search request has an independent decision.

The decision layer would be a composite decision similar to AllocationDecider for example, with multiple factors providing their decision. The initial version would probably have a request type decider and a resource utilization based decider. The request type decider is to initially target specific type of requests such as aggs with possibility to expand to more requests in the future. This provides us some initial control over enabling this feature.
However, for the rest of this proposal, I will focus on the non-trivial logic of resource based decision making.

Given this abstraction, I am proposing the following algorithm for dynamic decision making

The decider monitors CPU utilization as well as overall search latency and concurrent search latency over a sliding window of size N.
Along with this, the decider will also have an initial percentage X% to control how many reqests in a time window will be allowed to be executed with concurrent segment search. This initial percentage can be set based on the SearchRate metric on the cluster, with starting value close to min SearchRate

The decision algorithm would look like this:

If the CPU_max over window is below a threshold CPU_high, then there is sufficient resource availability to enable concurrent search. By default, the slice_count will be set to 2.
We can start by enabling concurrent search for X% of the requests and continue monitoring CPU as well as search latencies (both overall search latency and concurrent search latency).

Decider will continue to monitor the metrics and if CPU_max is still lower than CPU_high, increase the X% by enabling factor E. However, if the CPU_max is greater than or equal to CPU_high, then decrease the X% by decreasing factor D. To react faster to resource constraints, we can set D = 1.5*E.

The decider also monitors concurrent search latency and overall search latency over the window N and if average concurrent search latency is greater than overall average search latency by a difference threshold diff_threshold, then concurrent search might not be providing performance benefits so we can reduce the percentage X% by a factor of D again.
In this way, we penalize both performance degradation as well as resource constraints and we enable in scenarios when CPU is available and performance improvements are seen.

The algorithm can also be smart to increase slice count (up to a max value MAX_SLICE) once most of the queries are being run in concurrent mode, determined by X going over a threshold percentage and we still have available CPU to further improve performance. MAX_SLICE can be set based on the instance type. Similarly, slice_count can be decreased (up to min value MIN_SLICE=2) if X goes below a certain threshold.

Decision loop:
    if CPU_max < CPU_high:
        #increase percentage by factor E    
        X = X + (X * E)
    else:
        #decrease percentage by factor D
        X = X - (X * D)

    if avg_concurrent_search_latency > avg_overall_search_latency:
        if diff > diff_threshold:
            X = X - (X * D)

    if X > T:
        #increase slice count
        slice_count = max(MAX_SLICE, slice_count * 2)
    if X < B:
        # decrease slice count
        slice_count = min(MIN_SLICE=2, slice_count / 2)

All of the variables in the above algorithm can be configurable via cluster settings.

Related component

Search:Performance

Describe alternatives you've considered

Multi-armed bandit approach

https://en.wikipedia.org/wiki/Multi-armed_bandit

In Multi-armed bandit approach (in our case it is 2-armed Bandit), we try to balance exploration (sometimes explore concurrent_search path) and exploitation (exploit scenarios where we see good benefit with enabling concurrent search)

Monitor CPU and latency metrics similar to approach above over sliding window of size N to keep track of CPU_max for the window.
Have a threshold for CPU utilization CPU_high
Now the bandit algorithm will make decision based on 2 factors
1. Exploration: randomly route some requests to concurrent_search path based on random probability
2. Exploitation: Have a reward trade-off factor comparing concurrent_search efficiency vs non-concurrent search efficiency.
3. In this case we use the ε - greedy approach of balancing exploration vs exploitation.
Initial Reward is 0 for both concurrent search and non-concurrent search.
Then based on how search requests are performing, we will provide rewards to either of the 2 paths (concurrent vs non-concurrent).
The reward is calculated as follows
a. Concurrent search reward: total reward received for concurrent search over total request executed in concurrent path.
b. non-concurrent search reward: total reward received for non-concurrent search over total requests executed in non concurrent path.
Upon request completion, based on the metrics being monitored, the rewards are updated as follows - if CPU_max < CPU_high, then grant a reward E . However, if CPU_max >= CPU_high then grant negative reward D. Non-concurrent path always gets positive reward E. Once again, we can have D = 1.5*E . Also, penalize the concurrent search reward if the avg concurrent search latency becomes greater than overall average latency by a diff_threshold.
Slice count can also be adaptively modified based on rewards.

# Initial rewards will be 0 for both
total_reward_non_concurrent=0
total_reward_concurrent=0


function select_path():
    # explore 
    if random < some_arbitrary_threshold:
        return concurrent_path
    else:
        # exploitation
        return (total_reward_concurrent / concurrent_count) > (total_reward_non_concurrent / non_concurrent_count)
        ## whichever was giving more rewards, i.e. better performance, we will take that path.
    

Decider logic:
	path = select_path()
	return path

Post decision update:
## post path selection, we can update variables for subsequent requests
if path == concurrent_path:
	concurrent_count += 1
	if CPU_max < CPU_high:
		reward_concurrent += E
	else:
		reward_concurrent -= D
	if avg_concurrent_search_latency > avg_overall_search_latency:
        if diff > diff_threshold:
            reward_concurrent -= D
else:
	reward_non_concurrent += E

On comparing the proposed approach and the current version of the multi-armed bandit approach; it seems to me that the two are quite similar but multi-armed bandit approach has ability to take concurrent path even at times when CPU is under duress. Also, in the first approach, we can think of looking at CPU_max as exploration and increasing the request percentage X% by E or decreasing by factor D as a form of reward and by extension, a form of exploitation.
Reinforcement learning approaches work better when the state of the system is well represented, and in the current version, we don’t have more fine-grained representations of the state, one such example being a detailed representation of the request. Furthermore, the efficiency of the multi-armed bandit strongly comes down to how robust the reward function is.

Additional context

Co-Author: @sohami

The text was updated successfully, but these errors were encountered:

jed326 · 2024-07-18T22:23:01Z

Thanks @Gankris96 for the RFC!

The way I see it we're trying to address a few different things at once here:

Increase adoption of concurrent segment search (ElasticSearch comes with their version enabled by default for example: https://www.elastic.co/search-labs/blog/elasticsearch-opensearch-vector-search-performance-comparison#out-of-the-box-and-concurrent-segment-search)
Do so in a way that does not lead to performance regressions for users, whether it be due to query type or due to resource utilization.

I think something like #9491 will be a great first step in terms of adoption to let users more easily try out concurrent segment search on their existing workloads.

Some specific questions:

At a high level, introduce an additional decision layer that runs as a separate component and monitors the metrics such that it can be retrieved as a simple in-memory lookup at runtime during a search request. This will be done with the tenets that dynamic decision making needs to be fast and not add additional latency to the search request.

Are you thinking this decision layer would be provided as a plugin or as a part of core? Or could it even be implemented by something like adding plugin hooks to the existing search backpressure interfaces

It would be great to get a list of the tuning knobs that you are thinking would be available to the user. As it is search backpressure has so many tuning knobs that aren't super clear what each knob even does so it would be best to be as concise and precise as possible with those.

@reta I know you had some thoughts you shared during the weekly search meeting as well, would be great if you could share again here too

reta · 2024-07-19T13:19:15Z

@reta I know you had some thoughts you shared during the weekly search meeting as well, would be great if you could share again here too

Thankls @jed326 , you have perfectly outlined the same concerns I have, +1 to start with #9491 as low hanging fruit.

Gankris96 · 2024-07-19T18:44:57Z

Thank @jed326 and @reta
I agree that request level parameter #9491 would be helpful as a first step. However, my thoughts on this is to minimize user level involvement as much as possible.
I see this helping in cases where the user doesn't have access to modify cluster settings for example. However, the user has to identify the right category of requests to use this parameter with, if not could potentially lead to resource utilization spiking.

In terms of the decision layer itself - I envisioned it to be within core itself, probably similar to search backpressure service for example, because I dont know that there would be use of this outside of the search flow. We can provide capability to have pluggable deciders to add onto existing deciders.

sohami · 2024-07-23T06:15:52Z

I agree that request level parameter #9491 would be helpful as a first step. However, my thoughts on this is to minimize user level involvement as much as possible.

+1 to this, the goal here is to reduce user involvement in deciding whether to use concurrent vs sequential execution path. OpenSearch execution layer should be able to make that choice for the user instead of user driving it (which is what index or cluster level option provides)

We can provide capability to have pluggable deciders to add onto existing deciders.

We will need to provide pluggable mechanism such that for different use cases, some other decision logic can be integrated by plugins. We can keep the generic decision making components like cpu utilization in core deciders. For example: In case of KNN, it can look into native memory usage to decide on when to not enable concurrent search on more requests. It will be separate from SearchBackPressureService, as this is specific to choosing a query execution mechanism.

Gankris96 · 2024-07-30T21:03:18Z

I still think that building the capability to work independent of user intervention or with minimal user intervention is what we should aim for. Providing a request level capability means we anyway rely on user sending in requests with this parameter which defeats the purpose. I am going to be looking into the low level design for this so please let me know if you have further thoughts/concerns on this.
The decision layer would be within core with a pluggable interface for external plugins to add their own decision layers and the combination of these deciders will result in dynamically toggling on/off concurrent search.

@jed326 @reta @sohami

reta · 2024-07-30T21:14:53Z

I still think that building the capability to work independent of user intervention or with minimal user intervention is what we should aim for.

@Gankris96 I don't think anyone is arguing with that. I think the concern is - there are no clear metrics / measurements / signals to build that right now. Fe., I see only CPU mentioned here, but I think it is far from being sufficient: we don't look at the thread pool utilization that concurrent search is using (it may just queue all the work). This is just one of the examples.

What I think is missed here is:

computational model for concurrent search
index geometry (size, number of segments, Add support for intra-segment search concurrency apache/lucene#13542 may change that)
from the above, which factors has to be taken into account for decision making

I am sure there is much more to that.

Gankris96 · 2024-07-30T23:06:20Z

Thanks @reta. I did think about the thread pool utilization too but I don't think there is a direct correlation in terms of the performance or at least a visible one. Do we need to care about thread_pool queue increasing if the search latency is still improving anyway? If there really is a hit in performance i guess it would consequently show up in the concurrent search avg latency becoming worse and the decider would have a way to detect that and reduce the number of requests getting run with concurrent segment search so that there is no performance degradation due to queuing. Do you see other issues because of this?

Currently, the major limiting factor for concurrent search based on our benchmarking in resource availability and specifically CPU.

Anyway, I was thinking of this in terms of making an initial step towards having concurrent search dynamically enabled under certain conditions to increase the feature adoption. Additional decision parameters can be added to make the decision making more comprehensive.

Let me think more on the computational model for concurrent search and index geometry points you brought up as well.

jed326 · 2024-07-31T19:17:00Z

Fe., I see only CPU mentioned here, but I think it is far from being sufficient: we don't look at the thread pool utilization that concurrent search is using (it may just queue all the work).

I think this is fine for now since it's called out that we will expose pluggable deciders so we can keep refining this system with additional parameters in the future.

Gankris96 · 2024-08-15T04:57:05Z

Create a sub issue that tracks part of the changes described in this Issue here: #15259

Gankris96 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 17, 2024

github-project-automation bot added this to OpenSearch Roadmap Jul 17, 2024

github-project-automation bot moved this to Issues and PR's in OpenSearch Roadmap Jul 17, 2024

github-actions bot added the Search:Performance label Jul 17, 2024

github-project-automation bot added this to Search Project Board Jul 17, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Jul 17, 2024

jed326 added this to Concurrent Search Jul 17, 2024

github-project-automation bot moved this to Todo in Concurrent Search Jul 17, 2024

jed326 added the Search Search query, autocomplete ...etc label Jul 17, 2024

mch2 removed the untriaged label Jul 17, 2024

andrross added the Roadmap:Cost/Performance/Scale Project-wide roadmap label label Aug 6, 2024

Gankris96 mentioned this issue Aug 15, 2024

[RFC] Provide pluggable deciders for concurrent search #15259

Closed

yigithub added the v2.17.0 label Sep 9, 2024

yigithub moved this from New to In Progress in OpenSearch Roadmap Sep 9, 2024

yigithub moved this from In Progress to Done in OpenSearch Roadmap Sep 9, 2024

yigithub removed the v2.17.0 label Sep 11, 2024

shatejas mentioned this issue Sep 18, 2024

Integrates KNN plugin with ConcurrentSearchRequestDecider interface opensearch-project/k-NN#2111

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Dynamically enable concurrent search on search requests at runtime #14781

[RFC] Dynamically enable concurrent search on search requests at runtime #14781

Gankris96 commented Jul 17, 2024 •

edited

Loading

jed326 commented Jul 18, 2024

reta commented Jul 19, 2024

Gankris96 commented Jul 19, 2024

sohami commented Jul 23, 2024 •

edited

Loading

Gankris96 commented Jul 30, 2024

reta commented Jul 30, 2024 •

edited

Loading

Gankris96 commented Jul 30, 2024

jed326 commented Jul 31, 2024

Gankris96 commented Aug 15, 2024

[RFC] Dynamically enable concurrent search on search requests at runtime #14781

[RFC] Dynamically enable concurrent search on search requests at runtime #14781

Comments

Gankris96 commented Jul 17, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Background Overview

Describe the solution you'd like

Related component

Describe alternatives you've considered

Multi-armed bandit approach

Additional context

jed326 commented Jul 18, 2024

reta commented Jul 19, 2024

Gankris96 commented Jul 19, 2024

sohami commented Jul 23, 2024 • edited Loading

Gankris96 commented Jul 30, 2024

reta commented Jul 30, 2024 • edited Loading

Gankris96 commented Jul 30, 2024

jed326 commented Jul 31, 2024

Gankris96 commented Aug 15, 2024

Gankris96 commented Jul 17, 2024 •

edited

Loading

sohami commented Jul 23, 2024 •

edited

Loading

reta commented Jul 30, 2024 •

edited

Loading