-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Searchable Snapshots] [Low Level Design] Block Based Storage #4033
Comments
Quality design! Thanks @kotwanikunal What about the snapshots themselves? today we can create a snapshot under any name, containing any subset of the indices from the cluster. And to make matters more complex - several snapshots could contain the same index. Will this feature require the snapshot to be named a certain way? or that it contains only one index? ... |
Thanks @AmiStrn! @andrross can chime in if I have missed anything on the API front. |
I'm a bit confused by the terminology - block storage vs. object storage. Today snapshots are mainly stored on object storage. Where are the details about implementing block storage over an object storage? |
@nir-logzio The term "block" is being used a bit loosely here. The high level idea is that when Lucene needs to read a part of a segment file, instead of downloading the entire file onto the local disk cache this will only download the part of the file necessary and store them as logical "blocks" on the local disk. The terminology is perhaps a bit confusing but this doesn't mean that we'll be using a remote block store (e.g. GCP Persistent Disk, AWS EBS, etc). The remote storage remains the object stores (all supported repository implementations). |
Good one @kotwanikunal , super minor comment / suggestion to go from From implementation perspective, does it make sense to implement such index reader using Lucene's BufferedIndexInput / SlicedIndexInput? It seems like "block" may fit well into the "slice" in this case, just throwing an idea out there ... |
Thanks @reta! Sure. Matching the conventions sounds like a good plan. I will look into the |
Implemented as a part of #4892 |
This document outlines the low level design proposal for implementing block-based storage. High level design document and proposal: #3869
Overview
Block based file system will enable fetching parts of the Lucene IndexInput files from the snapshot within the repository instead of downloading the entire file on disk — only download the bytes accessed by the query.
BlockedIndexInput
The solution implements a wrapper around the IndexInput class to manage the block calculation, fetching and seeking mechanisms. This wrapper will work as a virtual file which will the utilized by Lucene to read index files and will internally keep track of the necessary blocks and the calculation required to fetch other blocks as per the query.
VirtualFileIndexInput
Additionally, another wrapper will be used to fetch the virtual file data (https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java#L205)which consists of the metadata around the segment files for Lucene. This will work on a “fetch-when-read” basis to download the entire file onto disk when needed by the service.
The design has been broken down into two phases -
Low Level Design
Phase 1: Without Cache
As described above, phase 1 will only implement the mechanisms necessary to enable a block-based fetching mechanism for segment files. In addition to the interface definitions, we will implement the block calculation and fetch logic within the BlockedIndexInput class.
The properties and methods like getBlock*, getCurrentBlock*, blockSize, blockMask, blockSizeShift will be utilized to enable the block calculation logic for the actual segment file described by fileInfo.
The fetchBlock, downloadBlockAsync, downloadTo will handle the fetching logic for the blocks as well as virtual files.
Phase 2: With Cache
The implementation from Phase 1 will be followed up to enable caching for the blocks leading to deletion based on cache eviction for reduced storage needs.
FileCachedIndexInput will implement a RefCount mechanism to keep track of open handles as well as ensure the file is cached within a BlockCache/FileCache.
Directory
ReadOnlyDirectory will be implemented to block writes for the Store and will complement the ReadOnlyEngine for Searchable Snapshots. It will utilize the IndexInput classes described above to open the index for reads for a remote snapshot.
The text was updated successfully, but these errors were encountered: