-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let addLastRunError to report user errors #180040
Let addLastRunError to report user errors #180040
Conversation
/ci |
For #180094, it would be convenient to have the |
/ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for putting this together
warnings: string[]; | ||
outcomeMessage: string; | ||
} | ||
|
||
interface LastRunError { | ||
message: string; | ||
userError?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional nit:
userError?: boolean; | |
userError: boolean; |
@@ -715,10 +715,11 @@ export class TaskRunner< | |||
|
|||
const { errors: errorsFromLastRun } = this.ruleResult.getLastRunResults(); | |||
if (errorsFromLastRun.length > 0) { | |||
const isUserError = !errorsFromLastRun.some((lastRunError) => !lastRunError.userError); | |||
return { | |||
taskRunError: createTaskRunError( | |||
new Error(errorsFromLastRun.join(',')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: do we need to map here as well?
new Error(errorsFromLastRun.join(',')), | |
new Error(errorsFromLastRun.map((lastRunError) => lastRunError.message).join(',')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
Pinging @elastic/response-ops (Team:ResponseOps) |
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @ersin-erdal |
…rs as user errors (#180094) ## Summary Building on #180040 Detection rules commonly fail when prebuilt rules are imported and enabled without the appropriate indices (for EQL) or ML jobs (for ML rules). EQL rules fail with a `verification_exception` because the EQL search API validates the fields in the search request against the indices in the request; if there are no indices then it returns an exception. ML rules fail with a `<job name> missing` exception on the search request if the job is not found. Both of these errors do not mean that the system is overloaded or performing incorrectly in some way, but they are still showing up in large volumes on SLO dashboards. This PR builds on #180040, which introduces the ability to classify errors as "user errors" when the error is not due to some kind of system malfunction, but more related to incorrect (or insufficient) user actions. ### Testing #### EQL 1. Create 2 indices, `test` and `test2` ``` PUT /test { "mappings": { "properties": { "@timestamp": { "type": "date" } } } } PUT /test2 { "mappings": { "properties": { "@timestamp": { "type": "date" }, "event.category": { "type": "keyword" } } } } ``` 2. Create (disabled) an EQL rule that queries `test*` and uses a simple query like `file where true` 3. Delete the index `test2` 4. Enable the rule. The rule will fail with a `verification_exception` because `test` does not have `event.category`. 5. Use your favorite debugging method to verify that `userError` was `true` in `addLastRunError` in `x-pack/plugins/alerting/server/monitoring/rule_result_service.ts` (hopefully this property will be added to the rule SO so we can check it there in API integration tests) #### ML rules 1. Import a prebuilt ML rule (`Unusual Process Spawned by a User`, for example) 2. Enable the rule. The rule will fail with `An error occurred during rule execution: message: "problem_child_rare_process_by_user missing"` 3. Use your favorite debugging method to verify that `userError` was `true` in `addLastRunError` in `x-pack/plugins/alerting/server/monitoring/rule_result_service.ts` (hopefully this property will be added to the rule SO so we can check it there in API integration tests)
…rs as user errors (elastic#180094) ## Summary Building on elastic#180040 Detection rules commonly fail when prebuilt rules are imported and enabled without the appropriate indices (for EQL) or ML jobs (for ML rules). EQL rules fail with a `verification_exception` because the EQL search API validates the fields in the search request against the indices in the request; if there are no indices then it returns an exception. ML rules fail with a `<job name> missing` exception on the search request if the job is not found. Both of these errors do not mean that the system is overloaded or performing incorrectly in some way, but they are still showing up in large volumes on SLO dashboards. This PR builds on elastic#180040, which introduces the ability to classify errors as "user errors" when the error is not due to some kind of system malfunction, but more related to incorrect (or insufficient) user actions. ### Testing #### EQL 1. Create 2 indices, `test` and `test2` ``` PUT /test { "mappings": { "properties": { "@timestamp": { "type": "date" } } } } PUT /test2 { "mappings": { "properties": { "@timestamp": { "type": "date" }, "event.category": { "type": "keyword" } } } } ``` 2. Create (disabled) an EQL rule that queries `test*` and uses a simple query like `file where true` 3. Delete the index `test2` 4. Enable the rule. The rule will fail with a `verification_exception` because `test` does not have `event.category`. 5. Use your favorite debugging method to verify that `userError` was `true` in `addLastRunError` in `x-pack/plugins/alerting/server/monitoring/rule_result_service.ts` (hopefully this property will be added to the rule SO so we can check it there in API integration tests) #### ML rules 1. Import a prebuilt ML rule (`Unusual Process Spawned by a User`, for example) 2. Enable the rule. The rule will fail with `An error occurred during rule execution: message: "problem_child_rare_process_by_user missing"` 3. Use your favorite debugging method to verify that `userError` was `true` in `addLastRunError` in `x-pack/plugins/alerting/server/monitoring/rule_result_service.ts` (hopefully this property will be added to the rule SO so we can check it there in API integration tests)
Resolves: #180030