Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps][Alerting] ES Query rule should reflect actual cause of fieldcaps errors #201266

Open
pmuellr opened this issue Nov 21, 2024 · 2 comments
Labels
Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@pmuellr
Copy link
Member

pmuellr commented Nov 21, 2024

See #175980 (comment)

In the case where the fieldcaps call run when an ES Query / KQL rule is run, and the fieldcaps call returns a 404, the error logged is

Executing Rule default:.es-query:{id} has resulted in Error: Data view with ID {id} no longer contains a time field

This is a bit misleading, because what actually happened was there were no indices matching the fieldcaps request. We should be more precise.

The referenced issue also notes that we have some "bad behavior" when a 502 is returned from fieldcaps. I suspect we'd see the same result. Something seems to be "eating" the errors out of the es call. Perhaps we can repro this with a jest integration test. We obviously like to see that we got a 502 response from the fieldcaps call as the reason for the rule failure.

@pmuellr pmuellr added Feature:Alerting Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Nov 21, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@pmuellr
Copy link
Member Author

pmuellr commented Nov 22, 2024

I want to point out this comment from an SDH https://github.com/elastic/sdh-elasticsearch/issues/8571#issuecomment-2492013970:

it is possible for field_caps to return a 404, but its rare and requires index & shard movement to occur during the API call

So we'll want an indication from DV if it got a 404. We may want to retry these, no clear yet. May be rare enough that failing is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

2 participants