Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerts are not recovered correctly in ICMP monitoring due to 'ping timeout' #187592

Open
iTiagoCO opened this issue Jul 4, 2024 · 5 comments
Open
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:obs-ux-management Observability Management User Experience Team Team:uptime

Comments

@iTiagoCO
Copy link

iTiagoCO commented Jul 4, 2024

Kibana version: 8.13.2

Elasticsearch version: 8.13.2

Describe the bug:

Alerts configured in the observability rule do not recover correctly when the condition that triggered them is no longer true. This appears to be bug-like behavior.

Steps to reproduce:

Configure a rule in Kibana to monitor ICMP status with condition MATCHING MONITORS ARE DOWN >= 3 times WITHIN last 10 minutes.
Shut down one of the monitored hosts to generate a "ping timeout".

Note that the alert fires correctly but does not recover when the host comes back online.

Expected behavior:

Alerts should automatically recover when the original condition that triggered them is no longer met.

Additional attachment discuss created for the current case that contains more details, screenshots, logs, etc.

https://discuss.elastic.co/t/problem-with-alerts-recovered/362036

@iTiagoCO iTiagoCO added the bug Fixes for quality problems that affect the customer experience label Jul 4, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Jul 4, 2024
@dmlemeshko dmlemeshko added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jul 8, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Jul 8, 2024
@mikecote mikecote added Team:uptime Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team and removed Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jul 11, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@smith smith added the Team:obs-ux-management Observability Management User Experience Team label Jul 11, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@smith smith removed the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Jul 11, 2024
@iTiagoCO
Copy link
Author

Add error terminal.

[ERROR][plugins.ruleRegistry] ResponseError: {"errors":true,"took":5,"ingest_took":3,"items":[{"create":{"_index":".internal.alerts-observability.uptime.alerts-default-000046","_id":"2b6bc7a5-16d5-40d5-a4dc-a49179f60e5d","status":409,"error":{"type":"version_conflict_engine_exception","reason":"[2b6bc7a5-16d5-40d5-a4dc-a49179f60e5d]: version conflict, document already exists (current version [1])","index_uuid":"LSIoSjsTTRuuGWTTmaz7fQ","shard":"0","index":".internal.alerts-observability.uptime.alerts-default-000046"}}},{"create":{"_index":".internal.alerts-observability.uptime.alerts-default-000046","_id":"831e1226-0373-4d0e-bcfc-81a97bea8b67","status":409,"error":{"type":"version_conflict_engine_exception","reason":"[831e1226-0373-4d0e-bcfc-81a97bea8b67]: version conflict, document already exists (current version [1])","index_uuid":"LSIoSjsTTRuuGWTTmaz7fQ","shard":"0","index":".internal.alerts-observability.uptime.alerts-default-000046"}}},{"create":{"_index":".internal.alert...

@jasonrhodes
Copy link
Member

NOTE: Seems like there is a lot of additional detail in the linked discuss thread, it might be a good idea to try to capture more of that here in this issue. We'll try to recreate this and see what we can discover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:obs-ux-management Observability Management User Experience Team Team:uptime
Projects
None yet
Development

No branches or pull requests

6 participants