Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add support for read-ahead doc values prefetch #16727

Open
kasundra07 opened this issue Nov 27, 2024 · 4 comments
Open

[RFC] Add support for read-ahead doc values prefetch #16727

kasundra07 opened this issue Nov 27, 2024 · 4 comments
Labels

Comments

@kasundra07
Copy link
Contributor

Is your feature request related to a problem? Please describe

A searchable snapshot index reads data from a snapshot repository on demand at search time rather than downloading all index data to cluster storage at restore time by using block based storage mechanism. This enables fetching only the parts of the Lucene IndexInput files accessed by the query from the snapshot within the repository instead of downloading the entire files on disk, reducing the total storage requirement per node.

However, when it comes to aggregation heavy workloads, fetching data blocks on-demand can introduce latency due to the multiple I/O operations required to fetch the necessary data for each query.

To improve the performance in such scenarios, we can introduce a read-ahead prefetch mechanism that proactively fetches next N blocks in anticipation of demand when the current block is accessed, thus reducing the I/O bottlenecks.

Describe the solution you'd like

The read-ahead prefetch mechanism leverages the sequential access behavior typically seen in the aggregation queries. Aggregation queries access .dvd files to retrieve doc values required for processing. Such queries typically touch all documents matching the query clause, resulting in reads across multiple blocks of .dvd files. The compactness of .dvd files relative to the shard size increases the probability of accessing contiguous blocks. For instance, we’ve seen that for a shard of 48.5 GB with 620 million documents, the .dvd file size was about 8.05 GB (approximately 1031 blocks if we consider 8MB blocks). Given this compactness, most matching docs are likely to correspond to adjacent or nearby blocks. This access pattern is ideal for read-ahead prefetching since the next set of blocks needed by the query can often be anticipated based on the current block being processed. This allows us to reduce latency and I/O wait times significantly, as the blocks will likely already be in local store by the time the query needs them.

Since aggregation queries make use of underlying lucene .dvd files, for now, we can start with integrating read-ahead prefetch for .dvd files, but it can easily be extended to other files where we expect such contiguous access pattern, for e.g., the kNN exact search use case without pre-filtering where it scans the .vec files in sequential manner. We would also need to consider dvd accesses within .cfs files in case the compound format is enabled.

I would work on POC for this and gather performance numbers after testing read-ahead for aggregation queries for searchable snapshots. Meanwhile, our internal testing showed that with read-ahead prefetch, aggregation query times are reduced by upto 60-70%.

Related component

Search:Searchable Snapshots

Describe alternatives you've considered

No response

Additional context

No response

@jed326
Copy link
Collaborator

jed326 commented Nov 27, 2024

Thanks @kasundra07 this sounds great.

for now, we can start with integrating read-ahead prefetch for .dvd files,

I'm wondering if we should just start by making the list of supported "prefetchable" file types a cluster setting (maybe even dynamic)? Just at a glance it doesn't seem like you need to do anything differently if you want to read-ahead .dvd files vs .vec files so I'm thinking we could just build that functionality and then based one testing determine the defaults to use.

@kasundra07
Copy link
Contributor Author

@jed326 That's a good suggestion. Yes, we can use cluster setting to keep track of read-ahead supported file types.

@kasundra07
Copy link
Contributor Author

Tagging a few folks for additional thoughts/suggestions: @sohami @andrross

@andrross
Copy link
Member

Sounds like a great idea @kasundra07!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants