[BUG] Batch async shard fetch holds up significant memory causing OOMs #14452

Bukhtawar · 2024-06-19T14:48:21Z

Describe the bug

During batch async fetch, it was observed that upto 8k node responses were being cached simultaneously with a whole bunch (4k) of retention leases per store metadata bloating up the over all memory eventually leading to OOM

Related component

Cluster Manager

To Reproduce

Create a 200 node cluster with 100k shards, 16G heap on cluster manager nodes
Restart all nodes of the cluster
Observe leader going OOM

Expected behavior

Cluster shouldn't go out-of-memory as it risks availability

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Bukhtawar added bug Something isn't working untriaged labels Jun 19, 2024

github-actions bot added the Cluster Manager label Jun 19, 2024

Bukhtawar removed the untriaged label Jun 19, 2024

github-project-automation bot added this to Cluster Manager Project Board Jun 19, 2024

Bukhtawar removed the Cluster Manager label Jun 19, 2024

github-project-automation bot moved this to 🆕 New in Cluster Manager Project Board Jun 19, 2024

Bukhtawar added the ShardManagement:Resiliency label Jun 19, 2024

github-project-automation bot added this to Shard Management Project Board Jun 19, 2024

github-project-automation bot moved this to 🆕 New in Shard Management Project Board Jun 19, 2024

Bukhtawar added the Cluster Manager label Jun 19, 2024

Bukhtawar changed the title ~~[BUG] Batch async shard fetch holds up too significant memory causing OOMs~~ [BUG] Batch async shard fetch holds up significant memory causing OOMs Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Batch async shard fetch holds up significant memory causing OOMs #14452

[BUG] Batch async shard fetch holds up significant memory causing OOMs #14452

Bukhtawar commented Jun 19, 2024

[BUG] Batch async shard fetch holds up significant memory causing OOMs #14452

[BUG] Batch async shard fetch holds up significant memory causing OOMs #14452

Comments

Bukhtawar commented Jun 19, 2024

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details