[BUG] Cluster Health API call can get tripped by circuit breaker #631

Bukhtawar · 2021-04-28T15:55:49Z

Describe the bug
When the JVM memory pressure is high the calls to cluster health might fail with

[2021-04-05T17:37:46,637][INFO ][c.a.c.e.logger           ] [cc0fd770314ce44c33bedf35605e9c4d] GET /_cluster/health local=true 429 TOO_MANY_REQUESTS 865 1
[2021-04-05T17:37:46,631][INFO ][c.a.c.e.logger           ] [cc0fd770314ce44c33bedf35605e9c4d] GET /_cluster/health local=true 429 TOO_MANY_REQUESTS 865 0
[2021-04-05T17:37:44,838][INFO ][c.a.c.e.logger           ] [cc0fd770314ce44c33bedf35605e9c4d] GET /_cluster/health local=true 429 TOO_MANY_REQUESTS 865 0
[2021-04-05T17:37:44,838][INFO ][c.a.c.e.logger           ] [cc0fd770314ce44c33bedf35605e9c4d] GET /_cluster/health local=true 429 TOO_MANY_REQUESTS 865 0

{
    "error": {
        "root_cause": [
            {
                "type": "circuit_breaking_exception",
                "reason": "[parent] Data too large, data for [<http_request>] would be [2029039272/1.8gb], which is larger than the limit of [2023548518/1.8gb], real usage: [2029039272/1.8gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=5285/5.1kb, in_flight_requests=0/0b, accounting=50225284/47.8mb]",
                "bytes_wanted": 2029039272,
                "bytes_limit": 2023548518,
                "durability": "PERMANENT"
            }
        ],
        "type": "circuit_breaking_exception",
        "reason": "[parent] Data too large, data for [<http_request>] would be [2029039272/1.8gb], which is larger than the limit of [2023548518/1.8gb], real usage: [2029039272/1.8gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=5285/5.1kb, in_flight_requests=0/0b, accounting=50225284/47.8mb]",
        "bytes_wanted": 2029039272,
        "bytes_limit": 2023548518,
        "durability": "PERMANENT"
    },
    "status": 429
}

Expected behavior
Cluster health calls shouldn't get tripped by the circuit breaker as they are important and informative and represents the state of the system

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

The text was updated successfully, but these errors were encountered:

tlfeng · 2021-05-04T00:10:20Z

Hi @Bukhtawar,

Could you explain more about how to reproduce the issue?
Looks like it has been fixed in Elasticsearch 5.0 (elastic/elasticsearch@f32b700), besides, request to / is also whitelisted from Circuit Breaking exception in Elasticsearch 6.5 (elastic/elasticsearch@027a22a).

During my own testing, I didn't find "Cluster Health API" call is tripped by circuit breaker.
My steps:

Start OpenSearch beta1 in Ubuntu with default setting.
Set the parent circuit breaker with a low limit: curl -XPUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{"persistent" : {"indices.breaker.total.limit" : "5%"}}'
Check the heap usage curl "localhost:9200/_cat/nodes?h=heap*&v", found "circuit_breaking_exception" in the response
Check the cluster health curl "localhost:9200/_cluster/health?pretty", got the desired response without error.

anshul291995 · 2021-05-05T04:36:02Z

Looking into reproducing this issue. Will update.

dblock · 2021-07-16T18:24:28Z

@anshul291995 @Bukhtawar any updates here, what should we do with this?

Bukhtawar · 2021-07-16T18:41:37Z

We'll need to try to repro here. I'll see if I can pick this up, any help from any community member would be of great help too

reta · 2021-08-18T15:55:50Z

@Bukhtawar @dblock would you mind if I try to reproduce and (hopefully) fix it? thanks

reta · 2021-08-18T20:44:59Z

So far confirming @tlfeng findings, not reproducible for /_cluster/health: the health checks are configured to bypass all circuit breakers, it applies both to rest and transport actions. Certainly more details would help:

OpenSearch version
installed Plugins?
where the logs are coming from? (does not look like OpenSearch server)

dblock · 2021-08-31T18:50:44Z

@Bukhtawar @dblock would you mind if I try to reproduce and (hopefully) fix it? thanks

No need to ask for permission! Thank you for contributing.

minalsha · 2021-09-07T18:38:49Z

@Bukhtawar could you please help with details that @reta is seeking for? Thanks

Bukhtawar · 2021-09-07T19:33:23Z

I'll try to see if I can repro..

anasalkouz · 2021-11-16T01:08:32Z

Closing this issue. @Bukhtawar, please feel free to reopen incase you are able to reproduce it.

rramachand21 · 2024-05-02T12:18:50Z

Reopening as this is an issue that needs to be fixed.

andrross · 2024-05-08T16:07:32Z

[Triage - attendees 1 2 3 4]
@rramachand21 Do you have any additional information about reproducing this? The findings above suggest that this API should be configured to bypass all circuit breakers.

ashking94 · 2024-09-25T06:09:26Z

The underlying issue as I have seen may also cause a node to not join the cluster since the node join call also gets tripped by CBE and leads to persistent node drops.

Bukhtawar added the bug Something isn't working label Apr 28, 2021

anasalkouz added the distributed framework label Oct 13, 2021

anasalkouz closed this as completed Nov 16, 2021

rramachand21 reopened this May 2, 2024

github-actions bot added the untriaged label May 2, 2024

andrross added the Cluster Manager label May 8, 2024

github-project-automation bot added this to Cluster Manager Project Board May 8, 2024

github-project-automation bot moved this to 🆕 New in Cluster Manager Project Board May 8, 2024

andrross removed untriaged distributed framework labels May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cluster Health API call can get tripped by circuit breaker #631

[BUG] Cluster Health API call can get tripped by circuit breaker #631

Bukhtawar commented Apr 28, 2021

tlfeng commented May 4, 2021 •

edited

Loading

anshul291995 commented May 5, 2021

dblock commented Jul 16, 2021

Bukhtawar commented Jul 16, 2021

reta commented Aug 18, 2021

reta commented Aug 18, 2021

dblock commented Aug 31, 2021

minalsha commented Sep 7, 2021

Bukhtawar commented Sep 7, 2021

anasalkouz commented Nov 16, 2021

rramachand21 commented May 2, 2024

andrross commented May 8, 2024

ashking94 commented Sep 25, 2024

[BUG] Cluster Health API call can get tripped by circuit breaker #631

[BUG] Cluster Health API call can get tripped by circuit breaker #631

Comments

Bukhtawar commented Apr 28, 2021

tlfeng commented May 4, 2021 • edited Loading

anshul291995 commented May 5, 2021

dblock commented Jul 16, 2021

Bukhtawar commented Jul 16, 2021

reta commented Aug 18, 2021

reta commented Aug 18, 2021

dblock commented Aug 31, 2021

minalsha commented Sep 7, 2021

Bukhtawar commented Sep 7, 2021

anasalkouz commented Nov 16, 2021

rramachand21 commented May 2, 2024

andrross commented May 8, 2024

ashking94 commented Sep 25, 2024

tlfeng commented May 4, 2021 •

edited

Loading