Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to obtain an index's name from its UUID. #13001

Closed
pakshi-titaniam opened this issue Apr 1, 2024 · 22 comments
Closed

Ability to obtain an index's name from its UUID. #13001

pakshi-titaniam opened this issue Apr 1, 2024 · 22 comments
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing untriaged

Comments

@pakshi-titaniam
Copy link

pakshi-titaniam commented Apr 1, 2024

Is your feature request related to a problem? Please describe

Portal26 (AWS Partner) builds and sells a plugin for OpenSearch. Among other things the plugin adds encryption at rest for the data written to the file system. Customers can specify different encryption keys for different indices. In order to do this, customers must maintain a mapping between an index's UUID and the encryption key that must be used for it.

Some of our customers run their SaaS business on top of OpenSearch. They tend to have 100s of indices for each of their customers. And they host 10s of such customer on one cluster. Maintaining a mapping with between 1,000s of UUIDs to their corresponding encryption key becomes error-prone.

They happen to have a index naming convention where the index names start with a unique_customer_name. They want to maintain a mapping between index name prefix to encryption key.

The Lucene Codec plugin that Portal26 implements does not get index name. Instead it gets a file system location that contains the indices' UUID. Portal26 would like to get the index's name from its UUID.

This way we can support our customers to maintain a mapping between Index Name Prefix to Encryption Key. And we can translate the UUID to its index name and then retrieve the key.

Describe the solution you'd like

An internal API/ method which accepts an index's UUID and return's the index's name.

Related component

Indexing

Describe alternatives you've considered

We have considered building and maintaining a mapping within the plugin. But since the plugin does not persist anything (it is stateless) it has to build and maintain this in memory and it could be an expensive operation. And every time a new index is added, it has to modify this mapping which could involve periodic polling adding to the cluster workload.

Additional context

The customer who wants this is Pega systems. They asked us to tag them in this ticket. They will also voice their vote on the ticket.

@pakshi-titaniam pakshi-titaniam added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 1, 2024
@github-actions github-actions bot added the Indexing Indexing, Bulk Indexing and anything related to indexing label Apr 1, 2024
@reta
Copy link
Collaborator

reta commented Apr 1, 2024

This mapping could be easily obtained using _cat APIs:

$ curl http://localhost:9200/_cat/indices?h=index,uuid

index1 SbFwt5hhSviSj1YTFtwyEg

@msfroh
Copy link
Collaborator

msfroh commented Apr 1, 2024

From an internal API standpoint, if you have a reference to IndicesService, you can get it in a slightly roundabout way:

public static String resolveIndexName(IndicesService indicesService, String uuid) {
  Index indexForUuid = new Index("", uuid);
  IndexService indexService = indicesService.indexService(indexForUuid);
  if (indexService == null) {
    // throw an exception? 
    // Could also call indicesService.indexServiceSafe(indexForUuid), 
    // which will throw the exception for you.
  }
  return indexService.getIndexSettings().getIndex().getName();
}


@pakshi-titaniam
Copy link
Author

Thanks @reta and @msfroh. One of our constraints is as follows.
When the OpenSearch cluster starts up, it reads each index (headers).
Are these methods available at the cluster start up time?
I think the first one (curl /_cat/indices) is not available at the cluster start up time. How about the second one (resolveIndexName)?
Thanks both.

@cwperks
Copy link
Member

cwperks commented Apr 2, 2024

Would Plugin.onIndexModule be useful here?

I tried the snippet that @msfroh provided using the GuiceHolder pattern in a plugin to get the indices service and tried to get the indexName within Plugin.createComponents or ClusterPlugin.onNodeStarted, but neither worked and the IndexService returned from IndexService indexService = indicesService.indexService(indexForUuid); was null at that point in execution.


Before the node is fully initialized, Plugin.onIndexModule is called for every index in the cluster where you can obtain the index name and UUID for all indices in the cluster. On Node bootstrap, you may see messages like:

[2024-04-02T00:13:33,394][INFO ][o.o.p.PluginsService     ] [smoketestnode] PluginService:onIndexModule index:[.opendistro_security/tk6k2nCkRJuc5Gr4mXUxEQ]

and this line directly comes from onIndexModule here:

logger.info("PluginService:onIndexModule index:" + indexModule.getIndex());

Example of security plugin overriding onIndexModule: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/OpenSearchSecurityPlugin.java#L664-L665

@pakshi-titaniam
Copy link
Author

@cwperks Thanks. This looks promising. I will run it by my engineering team. Will comment again in a few days. Much appreciated.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@pakshi-titaniam It looks like this issue has been resolved. Please open a new issue if this is not the case.

@anto-tl
Copy link

anto-tl commented Apr 22, 2024

@cwperks onModuleModule method is useful to collect the uuid and indexName mapping. However in case of cluster startup/restart, the lucene reading related things are called for existing indices (where we required the indexName from uuid) - called even before onIndexModule. Is there any other way to collect the uuid and indexName ?

Also an alternative option could be store this uuid - indexName mapping in some persistence storage like file system similar to how Opensearch is maintaining the state in cluster with permissions to write to a file. So on cluster startup, we can read the mapping from the file and load it up in plugin as a MAP even before lucene read happens (on AbstractLifecycleComponent -> doStart).

Any pointers would be helpful.

@cwperks
Copy link
Member

cwperks commented Apr 24, 2024

@anto-tl I do see this when I look in the data directory of a node:

> cat data/nodes/0/indices/WUqRmN4DQtKFE5dtOSpb0A/_state/state-8.st
?�lstate:)
�.address-book�versionԎmapping_versionďsettings_version�aliases_version�routing_num_shards$ ��stateCopen�settings�index.creation_dateL1713919726108�index.number_of_replicas@1�index.number_of_shards@1�index.provided_nameL.address-book�index.replication.typeGDOCUMENT�index.uuidUWUqRmN4DQtKFE5dtOSpb0A�index.version.createdH137217827�mappings���DFL�V�O�OV��V*(�/H-*�L-��2��K�sSA��� �T�ZQ���J�I+�N�,�/JAR�Q�L��/J�OL�/�525����aliases��primary_terms���in_sync_allocations�0�UxTQZjX_GSJaGz-udmOvwcg��rollover_info��system#���(��

Where WUqRmN4DQtKFE5dtOSpb0A is the UUID of the .address-book index:

> curl -XGET http://localhost:9200/_cat/indices
yellow open .address-book WUqRmN4DQtKFE5dtOSpb0A 1 1 4 0 15kb 15kb

It is on disk, but I'm not sure of how to read that in from a plugin before onIndexModule is called.

@anto-tl
Copy link

anto-tl commented Apr 25, 2024

@cwperks Thanks for checking the details.

  • I found that opensearch service is partially up before onIndexModule is called - in the place of lucene related code we have. So I tried to call /_cat/indices and got the uuid and index mapping.
  • In the case of single node using the same node ip this is fine. In the case of multi node and when load balancer url is used, the API call can go to any available node on the restart/bootstrap. So it's not guaranteed that all nodes are up at this point of time and call might fail with service unavailable.

So to avoid complexity, I am looking for a way to do an internal call using code itself from plugin to get /_cat/indices response without doing an external API call with ip.

I found that RestIndicesAction class is used for _cat/indices call internally. Any idea how to call this class method in plugin to get the _cat/indices response details without doing an external API call?

@anto-tl
Copy link

anto-tl commented Apr 30, 2024

@cwperks Thanks for checking the details.

  • I found that opensearch service is partially up before onIndexModule is called - in the place of lucene related code we have. So I tried to call /_cat/indices and got the uuid and index mapping.
  • In the case of single node using the same node ip this is fine. In the case of multi node and when load balancer url is used, the API call can go to any available node on the restart/bootstrap. So it's not guaranteed that all nodes are up at this point of time and call might fail with service unavailable.

So to avoid complexity, I am looking for a way to do an internal call using code itself from plugin to get /_cat/indices response without doing an external API call with ip.

I found that RestIndicesAction class is used for _cat/indices call internally. Any idea how to call this class method in plugin to get the _cat/indices response details without doing an external API call?

@cwperks Any idea? I have also tried multiple Listeners like ClusterStateListener. These are all having the information after the state is loaded only. Most of the samples I have tried are not giving the necessary information before the lucene loading is completed and cluster state is changed.

@cwperks
Copy link
Member

cwperks commented Apr 30, 2024

I was trying to take a deep dive to see how the files are read from disk on node bootstrap, but I haven't been able to fully grok the code path.

@reta @dblock @msfroh any other ideas for getting a full list of index names and UUIDs before a cluster has fully initialized?

@reta
Copy link
Collaborator

reta commented May 1, 2024

@anto-tl could you please clarify what you mean by

the lucene reading related things are called for existing indices (where we required the indexName from uuid) - called even before onIndexModule. Is there any other way to collect the uuid and indexName ?

@anto-tl
Copy link

anto-tl commented May 2, 2024

@reta

  • On cluster restart I have noticed that the code flow goes to the following pieces of code

Calling: Store.tryOpenIndex

https://github.com/opensearch-project/OpenSearch/blob/2.9.0/server/src/main/java/org/opensearch/gateway/TransportNodesListGatewayStartedShards.java#L185

Calling: Lucene.readSegmentInfos

https://github.com/opensearch-project/OpenSearch/blob/2.9.0/server/src/main/java/org/opensearch/index/store/Store.java#L602
...
...

  • At this point we can get the uuid from the file system location. Here we want to get the indexName from uuid.
  • Problem with onIndexModule is that, it's called after the index is read by lucene. So we couldn't collect the uuid and indexName mapping on the restart scenario
  • As mentioned above one solution we are thinking is, trying to find a way to do the internal call from plugin code to _cat/indices content/response to get the indexName and uuid mapping as it's available at this point of time. I have already explained the problem with _cat/indices external call here. So we like to get the information from current running node itself with api call.
  • Is there a way to get the instance of RestIndicesAction from plugin code and get the _cat/indices response without doing an external API call?

Note:
Also I have tried to get indices information from clusterService (or) indicesService references. But index information is not loaded at this point of time. After the lucene segement read completed and clusterChanged event is triggered then only I can get the information. But we need uuid-indexName information even before this point of time.

@reta
Copy link
Collaborator

reta commented May 3, 2024

Ah I see, thanks @anto-tl for detailed explanation ... The ShardId that is passed Store method has ShardId that in turn has reference to index name and uuid. Shouldn't it be sufficient? I think there is no mechanism to propagate this contextual information down the line, is that the problem you are running into?

The sequence of initialization looks valid to me: the index has to be initialized first before onIndexModule call, it may be late to you since you apparently need that at the codec level.

@anto-tl
Copy link

anto-tl commented May 6, 2024

Hello @reta Thanks for the pointer on ShardId. I need to check how can I hook this in plugin and grab information from Store -> shardId. Let me take a look and let you know.

@anto-tl
Copy link

anto-tl commented May 9, 2024

Ah I see, thanks @anto-tl for detailed explanation ... The ShardId that is passed Store method has ShardId that in turn has reference to index name and uuid. Shouldn't it be sufficient? I think there is no mechanism to propagate this contextual information down the line, is that the problem you are running into?

The sequence of initialization looks valid to me: the index has to be initialized first before onIndexModule call, it may be late to you since you apparently need that at the codec level.

@reta Is there a way to get shardId information in plugin when TransportNodesListGatewayStartedShards->nodeOperation method is getting called. I am trying to find a way to extend this class and override nodeOperation method to get the shardId information (or from Store.tryOpenIndex). Still not sure how can we register or use this in plugin to use that. Any idea how can we achieve this?

@reta
Copy link
Collaborator

reta commented May 9, 2024

Still not sure how can we register or use this in plugin to use that. Any idea how can we achieve this?

@anto-tl I don't think you could alter TransportNodesListGatewayStartedShards::nodeOperation or Store.tryOpenIndex in any ways, those are static methods. Here is another idea: since you need it at codec level, the CodecService has IndexSettings supplied that also has index details. Plus, you could provide your own using EnginePlugin::getCustomCodecServiceFactory.

Besides that, I think we are getting to the end of the possible options, it seems like the feature you are working on needs to be looked to suggest the path forward.

@anto-tl
Copy link

anto-tl commented May 9, 2024

@reta Will take a look.
Also, Is there any option for calling _cat/indices logic internally via code in the same node without doing an external API call in plugin code? If we can call the same node to get the details then it's fine

Explained the multi node problem here and this is why wanted to external api call. If mTLS enabled (or) call goes to other on startup we can't get the needed _cat/indices response, that's why wanted to avoid external api call.

@reta
Copy link
Collaborator

reta commented May 9, 2024

Also, Is there any option for calling _cat/indices logic internally via code in the same node without doing an external API call in plugin code?

I thought you run into initialization sequence here, when the cluster was not ready to handle the requests when you called the API? In any case, createComponents provides the Client instance:

    public Collection<Object> createComponents(
        Client client,
        ClusterService clusterService,
        ThreadPool threadPool,
        ResourceWatcherService resourceWatcherService,
        ScriptService scriptService,
        NamedXContentRegistry xContentRegistry,
        Environment environment,
        NodeEnvironment nodeEnvironment,
        NamedWriteableRegistry namedWriteableRegistry,
        IndexNameExpressionResolver indexNameExpressionResolver,
        Supplier<RepositoriesService> repositoriesServiceSupplier
    ) {
 ...
}

The Client is instance of NodeClient (sadly, may need type check) which allows local execution: NodeClient::executeLocally

@anto-tl
Copy link

anto-tl commented May 14, 2024

@reta Thanks. If possible can you send some code snippet example for how to call _cat/indices with client.executeLocally if you have any idea, i.e how to build ActionType, ActionRequest, ActionListener for sending request for _cat/indices call. I am seeing different samples on opensearch code for client.executeLocally, but not sure how to build for _cat/indices since there is no specific ActionType Instance or Request in the RestIndicesAction class.

Also more more clarification: When we use the client.executeLocally() method from plugin code, if Opensearch is enabled with basic auth/mTLS, any authentication stuffs needs to be passed/required like we do for external API call ? (Assuming client.executeLocally is an internal call and not requires any authentication to be passed)

@anto-tl
Copy link

anto-tl commented May 14, 2024

@reta Finally I have found one way to get indexName and uuid mapping on the startup by below code

Works

  • Get all indices settings and collected the index name uuid information.
client.admin().indices().getSettings(new GetSettingsRequest()).actionGet().getIndexToSettings().values();

Not worked

  • Also I have tried like below to get the indicesStats information. But indicesStatsResponse.getIndices() map is empty on the startup. So couldn't use this
// Create a request to get indices stats
IndicesStatsRequest indicesStatsRequest = new IndicesStatsRequest();
indicesStatsRequest.indices();
indicesStatsRequest.indicesOptions(IndicesOptions.lenientExpandHidden());
indicesStatsRequest.all();
indicesStatsRequest.includeUnloadedSegments(false);

ActionListener<IndicesStatsResponse> actionListener = new ActionListener<>() {
    @Override
    public void onResponse(IndicesStatsResponse indicesStatsResponse) {
        Map<String, IndexStats> indices = indicesStatsResponse.getIndices();
        log.info("Indices: {}", indices);
    }

    @Override
    public void onFailure(Exception e) {
        log.error("error: {}", e.getCause());
    }
};

client.executeLocally(IndicesStatsAction.INSTANCE, indicesStatsRequest, actionListener);
  • Also tried this
client.admin().indices().prepareStats().all().get().getIndices(); // always returns 0 size

@reta
Copy link
Collaborator

reta commented May 14, 2024

@reta Finally I have found one way to get indexName and uuid mapping on the startup by below code

This is great, @anto-tl , I haven't looked into client.executeLocally, but I suspect it is not relevant anymore, thanks a lot for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing untriaged
Projects
None yet
Development

No branches or pull requests

6 participants