Should _cat API calls cause index refresh? #11225

Jon-AtAWS · 2023-11-15T23:25:40Z

Is your feature request related to a problem? Please describe.
_cat/indices' doc count lags reality when shards are in idle state due to lack of queries for the shard.

Describe the solution you'd like
Calls to the _cat APIs should trigger shard refresh. At least, for calls like _cat/indices that expose shard statistics and metrics.

Describe alternatives you've considered
A call to the _count or other query API partially fixes the problem. However, it won't trigger refresh on shards not queried (e.g., replicas or primaries that are not queries). Subsequent _cat/indices calls can hit non-refreshed shards.

Since the _cat API is administrative, it won’t cause too much perf degradation in normal operating mode.

andrross · 2023-11-16T03:25:50Z

Honestly the shard idle optimization really seems like a lot of trouble. It's a nice optimization for the system to automatically optimize for bulk load scenarios that happen when no searches are happening, but I'm curious how common that actually is. The idle behavior is somewhat antithetical other availability tenets like predictable performance (i.e. your system may work well because shards go idle, but then something changes to start sending sporadic search traffic and now ingestion starts failing because you were unknowingly dependent on the shards being idle).

Jon-AtAWS · 2023-11-16T03:44:57Z

@andrross - I understand and agree about predictable performance. Problem is, it was sometimes predictable bad performance. In the bad old days, before this optimization, we saw 25-50% increase in throughput through adjusting refresh_interval from 1s up to a minute. Now that's more like 10%, best case. In other words, while you could set refresh_interval low, it would hurt you.

And, relying on load-only metrics is a problem. Actually, AFAIK, the OSB workloads don't mix queries and indexing - all of the indexing is up front. We need a test workload that runs mixed query/indexing for exactly this reason. Mixed workloads are the hardest to scale for, since you have competing concerns that have different load characteristics (e.g., fewer shards are better for query, more shards are better for indexing).

So, letting shards idle is a good optimization, especially for logs workloads that can actually go long periods without queries. I think we need an expanded definition of "query" that includes (some? all?) _cat APIs. And the query should wake shards and replicas to mitigate against inconsistent results.

rishabh6788 · 2023-11-16T07:16:06Z

@Jon-AtAWS Regarding OSB workload supporting indexing and search in parallel, you can use pmc workload. We recently added indexing-querying test procedure that does this. Use --test-procedure="indexing-querying" OSB parameter while using PMC workload.

msfroh · 2023-11-16T17:51:12Z

Honestly the shard idle optimization really seems like a lot of trouble.

See also #9707

andrross · 2023-11-16T18:16:26Z

Thanks @Jon-AtAWS, the historical perspective is super helpful.

Regarding the specific request here, treating _cat APIs like queries with regard to waking shards up makes a lot of sense. Returning very stale data is not a good experience. The biggest risk I see is that any deployments have external automated monitoring systems (e.g. managed services) polling the _cat APIs could effectively disable the shard idle optimization with this change.

Jon-AtAWS · 2023-11-16T18:50:59Z

Thanks @rishabh6788 - didn't know that! I'll go play with it.

@andrross - agreed, that's a tradeoff of choosing to poll admin APIs. Having said that, it's usual (? I don't have statistics, but I suspect...) to poll at 30s-1m intervals, so the impact should be pretty low to non-existent. And, I would argue that if you're polling these APIs you actually want accurate results. We can choose which APIs should wake shards and try to minimize as well.

Apart from _cat APIs, we should consider _stats, _nodes/stats, <index>/_stats, etc. for waking shards if they don't already.

I'm pretty sure cluster health should not wake up shards, but you could convince me...

andrross · 2023-11-16T19:51:05Z

to poll at 30s-1m intervals, so the impact should be pretty low to non-existent

Given the default idle time of 30s, if you're polling at 30s then you would prevent shards from ever going idle. However, the point stands that if you're polling these APIs then you probably do want accurate results!

Agree that cluster health need not wake shards, but any API that reports information dependent on indexed data probably should.

Jon-AtAWS added enhancement Enhancement or improvement to existing feature or request untriaged labels Nov 15, 2023

msfroh added Search Search query, autocomplete ...etc Indexing Indexing, Bulk Indexing and anything related to indexing labels Nov 16, 2023

github-project-automation bot added this to Search Project Board Nov 16, 2023

github-project-automation bot moved this to 🆕 New in Search Project Board Nov 16, 2023

msfroh removed the untriaged label Nov 16, 2023

andrross mentioned this issue Mar 6, 2024

[BUG] Continuously index documents in opensearch, but there is no change in the number of documents queried through the interface #12123

Closed

getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should _cat API calls cause index refresh? #11225

Should _cat API calls cause index refresh? #11225

Jon-AtAWS commented Nov 15, 2023

andrross commented Nov 16, 2023

Jon-AtAWS commented Nov 16, 2023

rishabh6788 commented Nov 16, 2023

msfroh commented Nov 16, 2023

andrross commented Nov 16, 2023

Jon-AtAWS commented Nov 16, 2023 •

edited

Loading

andrross commented Nov 16, 2023

Should _cat API calls cause index refresh? #11225

Should _cat API calls cause index refresh? #11225

Comments

Jon-AtAWS commented Nov 15, 2023

andrross commented Nov 16, 2023

Jon-AtAWS commented Nov 16, 2023

rishabh6788 commented Nov 16, 2023

msfroh commented Nov 16, 2023

andrross commented Nov 16, 2023

Jon-AtAWS commented Nov 16, 2023 • edited Loading

andrross commented Nov 16, 2023

Jon-AtAWS commented Nov 16, 2023 •

edited

Loading