[Paginating ClusterManager Read APIs] Paginate _cat/segments API. #14259

gargharsh3134 · 2024-06-13T05:21:56Z

Is your feature request related to a problem? Please describe

As the number of indices and data/segments grow in an opensearch cluster, the response size and the latency of the _cat/segments API increases, which makes it difficult not only for the client to consume such large responses, but also stresses out the cluster by making it accumulate segment stats across all the indices.

Describe the solution you'd like

Blocking the API for large clusters.
Given that opensearch cluster can be used to store large amount of data, the number of segments can be very high (for eg, 1.6 million segments for 80TB data across 5k indices). If there isn't any strong use-case of the API for any monitoring activity, it can very well be blocked for large clusters to prevent unnecessary usage of nodes' resources.
Pagination:
If the API is not to be blocked, one of the proposal to paginate will depend on the way _cat/shards API is paginated. A list of shards can be generated for a requested page (as per the finalized approach of _cat/shards API) and all the segments belonging to those shards can be displayed in a single page.
The pageSize parameter would not be very well defined in that case. Instead of it directly being, number of segments in a page, it would rather be number of shards corresponding to which all the segments will be displayed.
This would also be susceptible to skewness in the cluster. If shards having relatively more data/segments get picked up in same page, response sizes can significantly vary.

Related component

Cluster Manager

Describe alternatives you've considered

No response

Additional context

No response

dblock · 2024-07-01T16:32:34Z

[Catch All Triage - Attendees 1, 2, 3, 4, 5]

msfroh · 2024-07-01T22:55:32Z

Right now, a lot of the /_cat APIs work via a scatter-gather across all nodes. Essentially, the request hits a node, it fans out to all nodes, then stitches the responses together.

One option for pagination might involve limiting the number of nodes that get queried per request.

dblock · 2024-07-02T17:50:13Z

Maybe there's a way to redesign this as a "walk the cluster" in a predictable sequence with a cursor that can remember its place even with cluster changes, therefore keeping pagination stable, and making no more than N fan-out requests ahead at any given time to gather more data.

Bukhtawar · 2024-07-02T17:59:08Z

I think we could fan out to all the nodes for the first request, create checkpoint of the point in time state and then from there on serve batch node responses ensuring that all nodes are using the same point in time state. This could be costly for certain APIs though so we really need a trade-off of consistency vs cost

backslasht · 2024-07-11T13:14:00Z

I see the conversation is aligned towards pagination based on checkpoints. Alternatively, can we make it filter based? and the filter can be based on index, shard, node, etc. With filters, server need not retain the checkpoint and it is also simpler to implement.

Thoughts?

backslasht · 2024-07-11T13:23:31Z

I see index name or index regex is already supported as target (filter). Can it be extended to shards and nodes?

shwetathareja · 2024-07-12T10:10:49Z

create checkpoint of the point in time state and then from there on serve batch node responses ensuring that all nodes are using the same point in time state.

Cluster state today is point in time information. Maintaining checkpoints would be very expensive, overhead will increase depending on state changes happening in the cluster. These APIs are used for diagnostic purpose. I feel adding checkpoints of cluster state will be overkill.

Alternatively, can we make it filter based?

The below API already support filtering on indices:
_cat/shards
_cat/segments
_cat/indices

But, the major problem is user don't always know what they are looking for and default response in large cluster would be very big for these APIs. We are thinking about adding more aggressive filtering support like node level in admin APIs separately but filtering only may not be sufficient. One thing I was thinking earlier was can we do implicit pagination like based on node. One challenge is node can leave and join the cluster and then the API responses could get confusing. Like in _cat/shards or _cat/segments API, a shard can appear multiple times if it was reassigned to a different node during pagination was in-progress. Paginating on indices with creation time can provide a more stable and deterministic pagination.

shwetathareja · 2024-07-15T07:18:21Z

By the way _cat/segments is not widely used API, pagination may not be P0 for this one. For this API, to start with we can enforce aggressive filtering but for _cat/shards pagination would be important.

gargharsh3134 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jun 13, 2024

github-actions bot added the Cluster Manager label Jun 13, 2024

github-project-automation bot added this to Cluster Manager Project Board Jun 13, 2024

github-project-automation bot moved this to 🆕 New in Cluster Manager Project Board Jun 13, 2024

dblock removed the untriaged label Jul 1, 2024

rwali-aws added the v2.16.0 Issues and PRs related to version 2.16.0 label Jul 11, 2024

gargharsh3134 mentioned this issue Jul 29, 2024

[META] Paginating _cat APIs #15014

Open

3 tasks

github-project-automation bot added this to OpenSearch Project Roadmap Aug 30, 2024

github-project-automation bot moved this to 2.17 (First RC 09/03, Release 09/17) in OpenSearch Project Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paginating ClusterManager Read APIs] Paginate _cat/segments API. #14259

[Paginating ClusterManager Read APIs] Paginate _cat/segments API. #14259

gargharsh3134 commented Jun 13, 2024 •

edited

Loading

dblock commented Jul 1, 2024

msfroh commented Jul 1, 2024

dblock commented Jul 2, 2024

Bukhtawar commented Jul 2, 2024

backslasht commented Jul 11, 2024

backslasht commented Jul 11, 2024

shwetathareja commented Jul 12, 2024 •

edited

Loading

shwetathareja commented Jul 15, 2024

[Paginating ClusterManager Read APIs] Paginate _cat/segments API. #14259

[Paginating ClusterManager Read APIs] Paginate _cat/segments API. #14259

Comments

gargharsh3134 commented Jun 13, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

dblock commented Jul 1, 2024

msfroh commented Jul 1, 2024

dblock commented Jul 2, 2024

Bukhtawar commented Jul 2, 2024

backslasht commented Jul 11, 2024

backslasht commented Jul 11, 2024

shwetathareja commented Jul 12, 2024 • edited Loading

shwetathareja commented Jul 15, 2024

gargharsh3134 commented Jun 13, 2024 •

edited

Loading

shwetathareja commented Jul 12, 2024 •

edited

Loading