Kibana alert fires when it should not have due to temporary disconnect of remote CCS connection #168293

henrikno · 2023-10-06T21:00:37Z

Kibana version:
8.10.2

Elasticsearch version:
8.10.2

Server OS version:
Elastic Cloud

Original install method (e.g. download page, yum, from source, etc.):
Elastic Cloud

Describe the bug:
We have an alert that queries for a specific document showing up at least 8 times within 10 minutes over a remote CCS connection. The alert triggers, but when we check there were zero documents that match the query, and we did not delete any documents. The history does not say that the query failed, it shows up as "Succeeded", yet no info about what triggered it. The only hit that something iffy happened is that the query took 15 seconds instead of the normal 1-2 seconds.

Steps to reproduce:

Create a Kibana alert that queries over a remote connection every minute.
Restart nodes, do an upgrade, or disconnect the nodes in any way.
Kibana alert triggers, the history shows Succeeded, but no info about why it triggered. It does not show up as timeout or failed/unknown status.

Expected behavior:
I expected the alert not to fire because there were no hits. Or at least give context about it firing because it could not get results.

Ideal scenario would be to not trigger if it's a transient issue, but if it's a sustained issue (for a configurable time), then trigger. For instance this seems to trigger when we do an upgrade, but then resolves itself.

Screenshots (if relevant):

Provide logs and/or server output (if relevant):

Any additional context:

elasticmachine · 2023-10-17T14:24:53Z

Pinging @elastic/response-ops (Team:ResponseOps)

pmuellr · 2024-01-24T17:50:30Z

Can you provide the rule type, and parameters used in the rule?

elkargig · 2024-01-25T15:37:45Z

another case where we had this problem was using the "Elasticsearch query" rule

rule check: every 5 minutes

pmuellr · 2024-01-25T17:56:13Z

potentially related to #168293

pmuellr · 2024-01-25T17:59:46Z

The action being used was iterating over the context.hits to print a field from the doc hits. We advised to also print {{_source._id}} from the hits, as we will then - in the future if this happens - see the actual document id's that the search returned. Hopefully this will provide more background into what is happening.

XavierM · 2024-01-29T16:10:24Z

@henrikno I talked to @ymao1 and @pmuellr about this issue. We have other SDH related to that problem but we do not have access to the data like here. For us to find a solution, we need to investigate but to do that we need to log a little bit more information in the message like that alertId (_id of the document) and the timestamp of the alert.

Do you think that's possible? and will we be able to access this kibana?

ymao1 · 2024-01-31T13:24:11Z

Created a dedicated investigation issue for this #175980 and linking this for the rule definition

henrikno added the bug Fixes for quality problems that affect the customer experience label Oct 6, 2023

botelastic bot added the needs-team Issues missing a team label label Oct 6, 2023

jughosta added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Oct 17, 2023

botelastic bot removed the needs-team Issues missing a team label label Oct 17, 2023

XavierM added this to AppEx: ResponseOps - Execution & Connectors Oct 17, 2023

github-project-automation bot moved this to Awaiting Triage in AppEx: ResponseOps - Execution & Connectors Oct 17, 2023

doakalexi moved this from Awaiting Triage to Todo in AppEx: ResponseOps - Execution & Connectors Oct 19, 2023

XavierM self-assigned this Jan 25, 2024

mikecote removed this from AppEx: ResponseOps - Execution & Connectors May 22, 2024

mikecote unassigned XavierM May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana alert fires when it should not have due to temporary disconnect of remote CCS connection #168293

Kibana alert fires when it should not have due to temporary disconnect of remote CCS connection #168293

henrikno commented Oct 6, 2023 •

edited

Loading

elasticmachine commented Oct 17, 2023

pmuellr commented Jan 24, 2024

elkargig commented Jan 25, 2024

pmuellr commented Jan 25, 2024

pmuellr commented Jan 25, 2024

XavierM commented Jan 29, 2024

ymao1 commented Jan 31, 2024

Kibana alert fires when it should not have due to temporary disconnect of remote CCS connection #168293

Kibana alert fires when it should not have due to temporary disconnect of remote CCS connection #168293

Comments

henrikno commented Oct 6, 2023 • edited Loading

elasticmachine commented Oct 17, 2023

pmuellr commented Jan 24, 2024

elkargig commented Jan 25, 2024

pmuellr commented Jan 25, 2024

pmuellr commented Jan 25, 2024

XavierM commented Jan 29, 2024

ymao1 commented Jan 31, 2024

henrikno commented Oct 6, 2023 •

edited

Loading