Alerts are not recovered correctly in ICMP monitoring due to 'ping timeout' #187592

iTiagoCO · 2024-07-04T16:54:59Z

Kibana version: 8.13.2

Elasticsearch version: 8.13.2

Describe the bug:

Alerts configured in the observability rule do not recover correctly when the condition that triggered them is no longer true. This appears to be bug-like behavior.

Steps to reproduce:

Configure a rule in Kibana to monitor ICMP status with condition MATCHING MONITORS ARE DOWN >= 3 times WITHIN last 10 minutes.
Shut down one of the monitored hosts to generate a "ping timeout".

Note that the alert fires correctly but does not recover when the host comes back online.

Expected behavior:

Alerts should automatically recover when the original condition that triggered them is no longer met.

Additional attachment discuss created for the current case that contains more details, screenshots, logs, etc.

https://discuss.elastic.co/t/problem-with-alerts-recovered/362036

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-07-08T08:39:04Z

Pinging @elastic/response-ops (Team:ResponseOps)

elasticmachine · 2024-07-11T16:06:26Z

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

elasticmachine · 2024-07-11T16:20:38Z

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

iTiagoCO · 2024-07-17T14:02:05Z

Add error terminal.

[ERROR][plugins.ruleRegistry] ResponseError: {"errors":true,"took":5,"ingest_took":3,"items":[{"create":{"_index":".internal.alerts-observability.uptime.alerts-default-000046","_id":"2b6bc7a5-16d5-40d5-a4dc-a49179f60e5d","status":409,"error":{"type":"version_conflict_engine_exception","reason":"[2b6bc7a5-16d5-40d5-a4dc-a49179f60e5d]: version conflict, document already exists (current version [1])","index_uuid":"LSIoSjsTTRuuGWTTmaz7fQ","shard":"0","index":".internal.alerts-observability.uptime.alerts-default-000046"}}},{"create":{"_index":".internal.alerts-observability.uptime.alerts-default-000046","_id":"831e1226-0373-4d0e-bcfc-81a97bea8b67","status":409,"error":{"type":"version_conflict_engine_exception","reason":"[831e1226-0373-4d0e-bcfc-81a97bea8b67]: version conflict, document already exists (current version [1])","index_uuid":"LSIoSjsTTRuuGWTTmaz7fQ","shard":"0","index":".internal.alerts-observability.uptime.alerts-default-000046"}}},{"create":{"_index":".internal.alert...

jasonrhodes · 2024-09-11T14:45:15Z

NOTE: Seems like there is a lot of additional detail in the linked discuss thread, it might be a good idea to try to capture more of that here in this issue. We'll try to recreate this and see what we can discover.

iTiagoCO added the bug Fixes for quality problems that affect the customer experience label Jul 4, 2024

botelastic bot added the needs-team Issues missing a team label label Jul 4, 2024

dmlemeshko added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jul 8, 2024

botelastic bot removed the needs-team Issues missing a team label label Jul 8, 2024

mikecote added Team:uptime Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team and removed Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jul 11, 2024

smith added the Team:obs-ux-management Observability Management User Experience Team label Jul 11, 2024

smith removed the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerts are not recovered correctly in ICMP monitoring due to 'ping timeout' #187592

Alerts are not recovered correctly in ICMP monitoring due to 'ping timeout' #187592

iTiagoCO commented Jul 4, 2024

elasticmachine commented Jul 8, 2024

elasticmachine commented Jul 11, 2024

elasticmachine commented Jul 11, 2024

iTiagoCO commented Jul 17, 2024

jasonrhodes commented Sep 11, 2024

Alerts are not recovered correctly in ICMP monitoring due to 'ping timeout' #187592

Alerts are not recovered correctly in ICMP monitoring due to 'ping timeout' #187592

Comments

iTiagoCO commented Jul 4, 2024

elasticmachine commented Jul 8, 2024

elasticmachine commented Jul 11, 2024

elasticmachine commented Jul 11, 2024

iTiagoCO commented Jul 17, 2024

jasonrhodes commented Sep 11, 2024