-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Writable Warm] Design and implement Composite Directory and integrate with FileCache #12781
Comments
Thanks @rayshrey for putting this proposal out. I like the overall idea of having this abstraction behind the Directory interface. Few questions:
Calls over the network may have higher latency costs, and hence high thread wait time. Can this cause the write/search threads to be blocked more than we would like and will it make sense/be feasible to offload this to either async I/O or a different pool?
Does this signify that file can be removed from the local store if needed?
How would this cache look like? I assume this is an on disk cache. Could you elaborate on how will this look like on disk?
Will this not require the knowledge of block being requested? Or does it ensure that files are always present on cache completely? |
Thanks @mgodwan for the insightful comments.
Good point. I think writes will be async as uploads to the remote store will be taken care of by the remote directory itself. Will check the feasibility for reads as well.
Yes, once uploaded to the remote store, local files can be deleted.
There is already an existing FileCache in OpenSearch which is currently be used for SearchableSnapshots.We will be reusing the same. Currently it does not support for tiering the data at different levels and ttl logic as well. Will open a separate issue for FileCache changes that are needed. This issue mainly focuses on how the Composite directory will be structured and how the FileCache will fit into this structure.
The approach I was thinking of was that both BLOCK and NON-BLOCK file types will be present in the Cache. For NON-BLOCK files, we simply return from the Cache whereas for BLOCK files we do what we did for the REMOTE FileState - return an instance OnDemandBlockIndexInput which handles all the abstractions for block level fetch(including caching the BLOCK files in FileCache as and when required) The other approach we can take is to keep only BLOCK level files in the Cache and always return an instance of OnDemandBlockIndexInput. The first approach sounds more reasonable to me as it gives us the flexibility of choosing between what we want to fetch according to our requirements - BLOCK or NON-BLOCK files. Your thoughts on this - @ankitkala @mgodwan ? |
@andrross @sohami @neetikasinghal |
@rayshrey Thanks for creating this issue. Couple of questions:
|
Thanks for writing this up. Some thoughts/questions.
|
Thanks @sohami and @mch2 for your insights.
Yes composite directory will be used for remote backed hot indices as well. Will add support for that incrementally in another PR once the base implementation is finalized in this one.
For hot data, we won't be caching it into FileCache, so all the data will be present locally for hot indices.
For data present locally we are not putting it in FileCache, we are simply fetching it directly from the localDirectory to keep things simple (as adding local files in FileCache and then fetching it from there wouldn't have any added benefits). For separation of hot/warm data files we will need to have some sort migration logic in the Directory itself once we start adding support for remote backed hot indices in composite Directory
Have modified the logic to check from the local and remote directories whether they are present locally or in remote.
As of now we are not putting the entire file in Cache, so FileCache will only have block files.
Yes it doesn't really seem necessary. As stated earlier, have updated the design and it can be checked in this PR
Yes this was a con in the previous approach and hence we have decided to move away from that to a new setting which would indicate whether full data is cached locally (hot index) or partial data is cached locally (warm index). This will allow users to have their own store type as local directory.
Have updated the design to remove the FileTracker and RemoteStoreFileTrackerAdapter overheads and am just injecting a remote directory now. Can refer the PR until I add the updated design in the description.
The problem with having an implementation similar to RemoteSnapshotDirectory is that the TransferManager uses BlobContainer for fetching the file. To get the BlobContainer of RemoteDirectory we need to expose a method in RemoteSegmentStoreDirectory to get the BlobContainer of its remoteDataDirectory which does not seem right as it will leak the abstractions of RemoteSegmentStoreDirectory. Hence we have exposed a new method in RemoteSegmentStoreDirectory which fetches the required file from remote to local which is called in the fetchBlock method of OnDemandCompositeBlockIndexInput. |
The benefit is that we are keeping the separation of hot/warm data which are referenced by local directory vs FileCache. That would mean all the local space occupied by a warm index will always be accounted via FileCache and that can be used by any accounting/disk monitoring mechanism for warm tier. Otherwise it will be difficult to explain when a index data is managed with or without FileCache |
@rayshrey thanks for posting this. Some thoughts/questions on the migration flows:
|
Most of the design discussions on this shifted to the POC implementation (which later turned to a concrete PR after the reviews). Listing down the current design which was implemented along with some other basic design decisions which were taken. WARM Index Setting Introduced a new index setting index.store.data_locality which can be either:
Example
Class Diagram Composite Directory Read and Write Flows Write (createOutput)
File Uploaded to Remote(afterSyncToRemote) Whenever a file is uploaded to remote, we will already have a full file entry for it in the file cache (from the above write flow) Read (openInput)
Changes in FileCache Initialization FileCache was initially introduced only for Searchable Snapshot use case and was initialized only on nodes which were configured for the Search role. Since we will be using FileCache as well for Writable Warm, we will be initializing FileCache based on our Feature Flag currently and reserve 80% of the node capacity for FileCache. TODO - Explore if we can have different node roles based on which we will be initializing the FileCache, such as WARM role (similar to the SEARCH role used earlier for Searchable Snapshots) RemoteDirectory Changes Currently RemoteDirectory only supports reading a full file via the openInput method.
Changes in TransferManager Currently TransferManager is configured to be able to read only from a BlobContainer considering in it’s original use case(Searchable Snapshot) the BlobContainer was already exposed. But for Composite Directory, BlobContainer is abstracted out and we will need to be able to read directly from Remote Directory as well. Hence we need to change the BlobContainer to a more generic StreamReader below.
This is how we will initialize TransferManager for Searchable Snapshot and for Composite Directory
|
Is your feature request related to a problem? Please describe
Currently we don’t have support for any directory implementation which can interact with both local and remote repositories. We are proposing creating a new directory implementation where data is backed in a remote store and not all data needs to be stored locally. This directory will behave as a local directory when complete files are present in disk, but can fall back to the on-demand fetch(can be extended to block level or non block level fetch) from the remote store when data is is not present locally.
Describe the solution you'd like
How will the user be able to create a Composite Directory for an index ?
We will add a new type to the index.store.type setting -
compositefs
to indicate that this index will use a composite directory.What will the Composite Directory look like ?
Here’s what the Class Diagram for Composite Directory will look like:
Our Composite Directory will have a FSDirectory instance(localDirectory), a FileCache instance and a RemoteStoreFileTracker implementation. Most of the file tracking abstractions such as adding files to tracker, checking whether they are present in local or remote etc are handled in the implementation of RemoteStoreFileTracker object - CompositeDirectoryRemoteStoreFileTracker. Abstractions such as fetching files from remote which are not available locally will be handled in the fetchBlob function where we will simply fetch the required files(in block or non-block format). This fetchBlob function will be called in the implementation of fetchBlock function of OnDemandCompositeBlockIndexInput (all abstractions related to block level fetch are handled in this class only)
More details on when the states of a file are changed, how reads and writes are handled given below.
When will the states of a file change ?
Any file in Composite directory goes through the following changes:
How will reads be handled in Composite Directory ?
Whenever a file read is requested(openInput) we will first check the state of the file from our FileTracker. If the file state is:
How will writes be handled in Composite Directory ?
Whenever a file write is requested(createOutput) we will fallback to the localDirectory to write the file. Since our IndexShard object already has a remote store object containing a remote directory, writes to the remote directory are handled via that only. Our CompositeDirectory will have a function - afterSyncToRemote (called in RemoteStoreRefreshListener after the segments are uploaded) which will take care of writing the files to cache once the file is uploaded to remote store.
Looking forward for review comments and discussions on this.
Related component
Storage:Remote
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: