Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Batch async shard fetch holds up significant memory causing OOMs #14452

Open
Bukhtawar opened this issue Jun 19, 2024 · 0 comments
Open

Comments

@Bukhtawar
Copy link
Collaborator

Describe the bug

During batch async fetch, it was observed that upto 8k node responses were being cached simultaneously with a whole bunch (4k) of retention leases per store metadata bloating up the over all memory eventually leading to OOM

Screenshot 2024-05-31 at 5 00 07 AM

Related component

Cluster Manager

To Reproduce

  1. Create a 200 node cluster with 100k shards, 16G heap on cluster manager nodes
  2. Restart all nodes of the cluster
  3. Observe leader going OOM

Expected behavior

Cluster shouldn't go out-of-memory as it risks availability

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@Bukhtawar Bukhtawar added bug Something isn't working untriaged labels Jun 19, 2024
@Bukhtawar Bukhtawar changed the title [BUG] Batch async shard fetch holds up too significant memory causing OOMs [BUG] Batch async shard fetch holds up significant memory causing OOMs Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 New
Status: 🆕 New
Development

No branches or pull requests

1 participant