Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] During index creation, upload path for tlog-data, tlog-md, segment-data and segment-md to fixed base path #12661

Closed
ashking94 opened this issue Mar 14, 2024 · 3 comments · Fixed by #13150
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Performance Storage:Resiliency Issues and PRs related to the storage resiliency

Comments

@ashking94
Copy link
Member

Is your feature request related to a problem? Please describe

With optimised prefix pattern for remote store path (as mentioned in #12567), it becomes necessary to know the path of each shard of all indexes that have been created ever to ensure that we have capability to clean up any zombie data that can happen due to async deletion of remote data after snapshot deletion (after index deletion) or async deletion of translog data.

Describe the solution you'd like

The proposal is to upload the path for each combination of data and metadata files for translog and segment in a fixed path in remote store to be able to refer to all possible paths for a cluster.

Related component

Storage:Performance

Describe alternatives you've considered

No response

Additional context

No response

@ashking94
Copy link
Member Author

ashking94 commented Mar 14, 2024

There are couple of options here for storing the paths for each shard of an index -

  1. Store in cluster state repo and put the details of all paths in a file named with IndexUUID of the index. In this case, we can assume that the user knows the underlying details about the details of repository being used for segment and translog. The user can disambiguate the paths for later deletion.
  2. Store the paths related to segment in segment repo and the paths related to translog in translog repo. Since it is possible to have separate paths for both the translog and segment, we will end up having more number of files to be stored to remote store.

Option 2 does not assume anything about the underlying repository and hence is a better approach. We can further optimise if the underlying repository is same for segment and translog. We can create just one file with paths of all shards.

@ashking94 ashking94 moved this from 🆕 New to 🏗 In progress in Storage Project Board Apr 9, 2024
@ashking94
Copy link
Member Author

I was considering MetadataCreateIndexService for hooking index creation. However, ruling it out due to following reasons -

  1. For each index creation, the upload needs to happen in sync and since the cluster manager batches cluster state updates, it means that we can have cluster update timing out if there are multiple indexes created.
  2. Since the MetadataCreateIndexService methods are called only in the context of just one index, we can not batch together the paths for different files in one file upload.

@ashking94
Copy link
Member Author

ashking94 commented Apr 9, 2024

We have another option that is mentioned below. The reason this is good and preferred is because we are able to identify index creation or the migration usecase (document replication type to segment replication with remote store enabled).

  1. Index creation case is handled by using the previous version of the index metadata. If it is null, then the index is getting created for the first time.
  2. Migration case is also handled here. There is a method called write full metadata that gets called in very few cases of which the very first time we enable the remote cluster state using the previousClusterUUID field.
    public ClusterMetadataManifest writeIncrementalMetadata(
    ClusterState previousClusterState,
    ClusterState clusterState,
    ClusterMetadataManifest previousManifest
    ) throws IOException {
    final long startTimeNanos = relativeTimeNanosSupplier.getAsLong();
    if (clusterState.nodes().isLocalNodeElectedClusterManager() == false) {
    logger.error("Local node is not elected cluster manager. Exiting");
    return null;
    }

    Discussed this with @soosinha as well. Let me raise a draft PR with the discussed change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Performance Storage:Resiliency Issues and PRs related to the storage resiliency
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

1 participant