[Feature Request] Providing api to save querycache to disk #16822

kkewwei · 2024-12-10T14:49:43Z

Is your feature request related to a problem? Please describe

It's widely acknowledged that the querycache plays a significant role in queries. However, when a node restarts, the os has to rebuild the querycache, which is a time-consuming process and can have a big impact on query performance.

1.Time-consuming to rebuild the querycache.

2.Query took(p99) becomes longer after the cluster restarting

Describe the solution you'd like

It is important for some query-sensitive indices to keep query performance, if we could provide api to save querycache to disk? when we begin to restart the node/cluster, we can first save the querycache to the disk.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

andrross · 2024-12-12T00:20:48Z

@jainankitk @kiranprakash154 @sgup432 FYI, seems like some overlap with issues you've worked on

peteralfonsi · 2024-12-12T20:17:16Z

I've been working on a proof of concept for plugging in different cache implementations to the query cache, including the TieredSpilloverCache which has a disk tier. TBD on whether the disk tier helps performance or not - the query cache entries are very large so there's a lot of overhead to serialize them. I should have numbers on this next week.

Currently the TSC doesn't persist its disk values after node restart. But if the PoC benchmark is promising it could make sense to make this change. However if the serialization/deserialization overhead is too much to actually use the disk values while the node is running, it'd probably make more sense to add some other way to dump all the contents to disk at node shutdown, and read them back on startup, and not use the TSC for this.

kkewwei · 2024-12-19T04:04:49Z

@peteralfonsi, In some query-sensitive scenarios,, we have proven that querycache can greatly speed up query time. This seems to be a very worthwhile to explore use TieredSpilloverCache, I would love to be a part of it.

peteralfonsi · 2024-12-19T17:37:01Z

Hey @kkewwei, appreciate the interest. I've wrapped up my proof of concept earlier this week. It looks like the disk tier does not make sense here. We had previously seen significant gain by adding a disk tier to the request cache, which has key/value pairs around 1-5 KB. The query cache has much larger entries - in my nyc_taxis based workload, around 3 MB each. It seems like Ehcache (the caching library backing the disk tier used in TieredSpilloverCache) as well as deserializing the DocIdSet objects from disk cause a lot of overhead when the values are this large. Ultimately performance was worsened.

Here's an annotated flamegraph showing the overhead:

and a graph showing p90 latencies on my benchmark for 4 different settings of query cache: the original, QC disabled, TSC-backed QC, and Caffeine-backed QC.

Even though using the TieredSpilloverCache doesn't make sense, I do think dumping all or at least some of the query cache entries to disk at shutdown time and reading them back in at startup could work. One issue we'll encounter is serializing all the different implementations of Query (basically the keys in the query cache), there are >200 and I don't think all of them can be serialized even in theory since they seem to depend on some Lucene state. We could add support for different query types one by one, and just accept that not all query cache entries can be persisted after restart.

kkewwei · 2024-12-20T09:56:00Z

@peteralfonsi, my expectation for querycache contains heap+disk:

The frequency used of docIds should be maintained in the heap.
Only when the heap is insufficient, the docIds will be offloaded to the disk.

I am uncertain as to whether both heap and disk are employed in the benchmark.

kkewwei · 2024-12-20T10:02:31Z

We could add support for different query types one by one, and just accept that not all query cache entries can be persisted after restart.

@peteralfonsi, I aggree with you, the commonly used Query may be only 20%.

kkewwei added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 10, 2024

github-actions bot added the Search:Performance label Dec 10, 2024

sandeshkr419 removed the untriaged label Dec 18, 2024

peterzhuamazon added this to Search Project Board Dec 19, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Providing api to save querycache to disk #16822

[Feature Request] Providing api to save querycache to disk #16822

kkewwei commented Dec 10, 2024 •

edited

Loading

andrross commented Dec 12, 2024

peteralfonsi commented Dec 12, 2024

kkewwei commented Dec 19, 2024

peteralfonsi commented Dec 19, 2024

kkewwei commented Dec 20, 2024

kkewwei commented Dec 20, 2024 •

edited

Loading

[Feature Request] Providing api to save querycache to disk #16822

[Feature Request] Providing api to save querycache to disk #16822

Comments

kkewwei commented Dec 10, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

andrross commented Dec 12, 2024

peteralfonsi commented Dec 12, 2024

kkewwei commented Dec 19, 2024

peteralfonsi commented Dec 19, 2024

kkewwei commented Dec 20, 2024

kkewwei commented Dec 20, 2024 • edited Loading

kkewwei commented Dec 10, 2024 •

edited

Loading

kkewwei commented Dec 20, 2024 •

edited

Loading