-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Tiered caching - OpenSearch #10024
Comments
We performed some POCs with disk tier based cache. Will add numbers here. |
For the Phase 2 tests, given that it's just running 5 queries, wouldn't any setup that allocates enough cache to accommodate those 5 queries (with no updates being processed) yield a significant improvement? I'm a little concerned that increasing the cache to hold all the queries in the benchmark makes the benchmark less useful as a measure of code performance, since the code stops being used -- results are just served from the cache. |
It is not just 5(actually 6) queries per se. We used 6 queries "template", randomized its field values and ran those in a loop(~4000 times) using 6 parallel clients(total ~72k queries) . So number of unique queries were much larger.
Tiered caching usefulness is to provide a much larger cache and for use cases where we are seeing some repeatable queries coming in(or a decent cache hit ratio). Currently in-memory cache size is limited to amount of memory available on node and it needs to evict items. So if we had unlimited memory available, we could have just used that. Plus due to our randomization logic, it might happen that query 1 which is sitting in cache is never called again. This benchmark tries to mimic((with a smaller in-memory cache size for simplicity) a real workload where a domain has maybe for example 50% cache hit ratio(with existing in-memory cache as seen from requestCache node stats) and see performance by adding a disk tier support with the same workload. As with tiered support, we can store more queries, effectively increasing hit ratio and seeing gains. Tiered caching might not make sense for cases where there is hardly any cache hit ratio or no evictions seen from existing in-memory cache.(As called out above). Also regarding testing with updates, invalidation happens effectively during refreshes. RequestCache uses lucene IndexReader.CacheKey as one of the keys, so if you hit same query again after refresh, different key is generated/stored and making older key/value stale. So we can also replicate this scenario by randomizing query values to generate different keys. Sample request stats from our experiments. Running same workload against below 2 use cases:
With additional disk tier support
|
Thanks @sgup432 for the added details! I do still worry that any benchmarks will have difficulty measuring the added benefit of increased caching, since it heavily depends on the workload executed. Regardless of how many queries are in the benchmark, there will be some cache size that will fit them all, at which point the cache hit ratio is just a function of how many times the queries are executed (since they'll all get cached after the first run). I'm definitely not trying to argue against tiered caching -- I think it's a neat idea to increase the size of the query cache. (I'm skeptical of adding a remote cache tier unless the index is readonly. That said, if the index is readonly, I think there could be some neat opportunities to precompute caches for frequently co-occurring clauses based on historical queries.) I am trying to say that the specific numbers from benchmarks probably can't be trusted (except to give a theoretical upper bound on potential speedup), though. |
Agree that it is dependent on the kind of workload being executed. We just used whatever we had i.e. picking up standard nyc_taxis queries, randomizing its values, turning on request cache(it is disabled by default) and run those queries in a loop to create a workload. It gives a good idea around the benefits we might see. In a real scenario, a domain might see queries getting cached in the first run/miss but never getting a cache hit after that. It can happen either due to invalidation happening(on refresh) or user never calls the same query again. And that controls the cache hit ratio. In our benchmark, we tried to replicate this scenario through extent of randomization(increase/decrease the range) of query field values. Where increasing randomization takes us towards the worst case scenario where we generate a lot of unique queries(which doesn't get a hit after first run), and decreasing randomization takes us towards best scenario where we see a very high cache ratio. I was trying to replicate different cache hit ratios(by tweaking randomization of query field values) and seeing benefits with tiered caching. Above benchmark was one of them. But I am open to suggestions if you have better ideas. The benefits of tiered caching are also dependent on the query took time. For workloads with very low query latencies(like simple term queries for example), it might not make sense to cache these on disk or use tiered caching in general as it might make performance worse. To handle this, we are taking QueryCost into consideration(as discussed in above design) so that we don't cache these kind of fast queries on disk. Also phase-1 results gives a good overview of the cost of using disk cache based latencies.
Yeah, remote cache tier use case needs more thinking but in general it might be useful for readonly as you mentioned or cases where users have a much larger cache dataset which can be stored in some cheap remote store instead on local disk taking into consideration the added network latency. For reference: This is the script we used to generate the workload. |
Great! This will work together with the high watermark feature that's monitoring disk usage of the nodes? Will the cache be implemented on the coordinating nodes or on all nodes? |
We are going to take that into consideration ie not cache items to disk when disk utilization on that node is running low.
This proposal is meant to extend existing caches we have by adding a disk tier. Like Request cache, Query cache. |
Great thanks! |
Related RFC - #9001
Co-Author: @jainankitk
Problem Statement
OpenSearch relies heavily on various caches to speed up the data retrieval process and thereby providing significant improvement in search latencies. This is usually done by caching query specific data in memory, avoiding multiple disk seeks
and thereby saving resources(CPU/JVM) by not processing the query again. But cache size is limited by the amount of memory available on a node. In cases where we are dealing with larger datasets which can potentially be cached, this causes a lot of cache evictions/misses and potentially impacting performance as we need to process the query again causing high resource consumption.
Solution
Considering the limitations of in memory caches(as described in above section), there could be a significant performance gain if we consider a tiered cache solution.
Tiered caching is basically a multi level cache with each tier having it’s own characteristics and performance levels. This will help us in adding more caching tiers so that we have the capability to cache a much larger dataset which is not limited by available memory.
Tiered caching options:
Use cases
Tiered caching option is not meant for every use case. Like for example, where customer is using non cacheable queries or have a search use case(does frequent updates). In such cases this option may not be feasible. We try to list down the use cases/scenarios in which this might be feasible.
Details around Existing caches
IndicesRequestCache: Caches the whole request on a shard level. Not all queries are cacheable and it is decided using this logic. Overall DFS type search queries, range queries with now() date, request with size >0 are not cacheable.
IndicesQueryCache: This caches DocIDSet on a segment level for a shard. Only works for non scoring sub queries like filter queries. As it can cache sub filter queries, this can be used across queries. It utilizes some caching policy to decide whether it needs to cache the query or not.
High level design
Approach: Spillover evicted items to a lower tier cache
Below diagram takes disk based cache as an example. But can be extended for other tiers as well.
Here OS in-memory cache refers to the existing OpenSearch caches we currently have like RequestCache/QueryCache.
Key components/features:
This component in below diagram describes the point where the key was not present in both in-memory or disk based cache(as an example) and the value needs to be loaded using default mechanism. In this case loading value means running the query and fetching results.
Total cache miss: In case there is a total cache miss for a request i.e. result is not present in either in-memory or disk based cache, we will cache this request inside in-memory cache like the way it happens now. Evicted items will be spilled over to a lower tier based cache.
3. Batch I/O operations: In case of a disk based/remote cache, we can consider to batch IO operations to avoid cost to do expensive writes to disk for example which can add extra latency to the search path. We can later flush all values in the background by using a separate thread. We can limit number of batched items on basis of entries count(A) or total size(bytes)(B) whichever is breached first. Will be kept configurable.
Cache eviction strategies
OpenSearch uses LRU based eviction policies for in-memory caches(Request/Query cache). LRU is a popular eviction policy due to its simplicity and provides decent cache hit rate in common scenarios. But it is known to be not optimal as it can evict items which might have been required in the near future due to occasional burst of other items.
There are other eviction policies like ARC(Adaptive Replacement Cache), Window TinyLfu etc which are proven to perform better than LRU. But for initial phases, we will stick to existing LRU based policy and eventually consider different policies in later phases.
Cache cleanup
Whenever a invalidation happens, stale keys are left hanging in the cache. Currently OpenSearch has a background job which runs every 1 minute to clean the stale keys lying in the cache. It keeps collecting stale keys by listening to IndexReader close events. We can use similar mechanism for lower tier cache as well.
Milestones
Milestone 1:
Milestone 2:
Milestone 3:
The text was updated successfully, but these errors were encountered: