Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Providing api to save querycache to disk #16822

Open
kkewwei opened this issue Dec 10, 2024 · 6 comments
Open

[Feature Request] Providing api to save querycache to disk #16822

kkewwei opened this issue Dec 10, 2024 · 6 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Dec 10, 2024

Is your feature request related to a problem? Please describe

It's widely acknowledged that the querycache plays a significant role in queries. However, when a node restarts, the os has to rebuild the querycache, which is a time-consuming process and can have a big impact on query performance.

1.Time-consuming to rebuild the querycache.
image

2.Query took(p99) becomes longer after the cluster restarting
image

Describe the solution you'd like

It is important for some query-sensitive indices to keep query performance, if we could provide api to save querycache to disk? when we begin to restart the node/cluster, we can first save the querycache to the disk.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

@kkewwei kkewwei added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 10, 2024
@andrross
Copy link
Member

@jainankitk @kiranprakash154 @sgup432 FYI, seems like some overlap with issues you've worked on

@peteralfonsi
Copy link
Contributor

I've been working on a proof of concept for plugging in different cache implementations to the query cache, including the TieredSpilloverCache which has a disk tier. TBD on whether the disk tier helps performance or not - the query cache entries are very large so there's a lot of overhead to serialize them. I should have numbers on this next week.

Currently the TSC doesn't persist its disk values after node restart. But if the PoC benchmark is promising it could make sense to make this change. However if the serialization/deserialization overhead is too much to actually use the disk values while the node is running, it'd probably make more sense to add some other way to dump all the contents to disk at node shutdown, and read them back on startup, and not use the TSC for this.

@kkewwei
Copy link
Contributor Author

kkewwei commented Dec 19, 2024

@peteralfonsi, In some query-sensitive scenarios,, we have proven that querycache can greatly speed up query time. This seems to be a very worthwhile to explore use TieredSpilloverCache, I would love to be a part of it.

@peteralfonsi
Copy link
Contributor

Hey @kkewwei, appreciate the interest. I've wrapped up my proof of concept earlier this week. It looks like the disk tier does not make sense here. We had previously seen significant gain by adding a disk tier to the request cache, which has key/value pairs around 1-5 KB. The query cache has much larger entries - in my nyc_taxis based workload, around 3 MB each. It seems like Ehcache (the caching library backing the disk tier used in TieredSpilloverCache) as well as deserializing the DocIdSet objects from disk cause a lot of overhead when the values are this large. Ultimately performance was worsened.

Here's an annotated flamegraph showing the overhead:
Screenshot 2024-12-18 at 9 24 13 AM

and a graph showing p90 latencies on my benchmark for 4 different settings of query cache: the original, QC disabled, TSC-backed QC, and Caffeine-backed QC.
Screenshot 2024-12-18 at 9 15 40 AM

Even though using the TieredSpilloverCache doesn't make sense, I do think dumping all or at least some of the query cache entries to disk at shutdown time and reading them back in at startup could work. One issue we'll encounter is serializing all the different implementations of Query (basically the keys in the query cache), there are >200 and I don't think all of them can be serialized even in theory since they seem to depend on some Lucene state. We could add support for different query types one by one, and just accept that not all query cache entries can be persisted after restart.

@kkewwei
Copy link
Contributor Author

kkewwei commented Dec 20, 2024

@peteralfonsi, my expectation for querycache contains heap+disk:

  1. The frequency used of docIds should be maintained in the heap.
  2. Only when the heap is insufficient, the docIds will be offloaded to the disk.

I am uncertain as to whether both heap and disk are employed in the benchmark.

@kkewwei
Copy link
Contributor Author

kkewwei commented Dec 20, 2024

We could add support for different query types one by one, and just accept that not all query cache entries can be persisted after restart.

@peteralfonsi, I aggree with you, the commonly used Query may be only 20%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants