-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ResponseOps] log error when ES Query rules find docs out of time range #186332
Conversation
/ci |
I think it might also be useful to add the calculated We're starting to see folks send us update: decided to not do this; I'm guessing we may need to tweak what we log, or find we need the info elsewhere, so I think we can defer doing this, till we find we need it. |
resolves elastic#175980 Adds a check with logging if an ES Query rule returns hits which fall outside the time range it's searching. This shouldn't ever happen, but seems to be happening on rare occaisons, so we wanted to add some diagnostics to try to help narrow down the problem.
1535a1b
to
dd6c861
Compare
/ci |
/ci |
Pinging @elastic/response-ops (Team:ResponseOps) |
/ci |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Verified works as expected.
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
|
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…ge (elastic#186332) resolves elastic#175980 ## Summary Adds a check with logging if an ES Query rule returns hits which fall outside the time range it's searching. This shouldn't ever happen, but seems to be happening on rare occaisons, so we wanted to add some diagnostics to try to help narrow down the problem. Note that the ES|QL flavor rule does not use this diagnostic, just search source (KQL) and query dsl. We check 3 things: - ensure the `dateStart` sent to fetch was valid - ensure the `dateEnd` sent to fetch was valid - ensure the relevant time fields in hits are within the dateStart/dateEnd range These produce three different error messages: `For rule '<rule-id>', hits were returned with invalid time range start date '<date>' from field '<field>' using query <query>` `For rule '<rule-id>', hits were returned with invalid time range end date '<date>' from field '<field>' using query <query>` `For rule '<rule-id>', the hit with date '<date>' from field '<field>' is outside the query time range. Query: <query>. Document: <document>` Each message has one tag on it: `query-result-out-of-time-range` ## To Verify To test invalid dateStart/dateEnd, hack the Kibana code to set the values to NaN's: https://github.com/elastic/kibana/blob/d30da09707f85d84d7fd555733ba8e0cb595228b/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts#L263-L264 For instance, change that to: const epochStart = new Date('x').getTime(); const epochEnd = new Date('y').getTime(); To test the invdivual document hits, first back out the change you made above - when those error, the checks we're testing below do not run. Hack the Kibana code to make the time out of range: https://github.com/elastic/kibana/blob/d30da09707f85d84d7fd555733ba8e0cb595228b/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts#L294 For instance, change that to: const epochDate = epochStart - 100 For both tests, create an es query rule - kql or dsl - make the relevant changes, and arrange for the rule to get hits each time. The relevant messages should be logged in the Kibana console when the rule runs. ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit e12e449) # Conflicts: # x-pack/plugins/stack_alerts/server/rule_types/es_query/lib/fetch_search_source_query.ts
…ime range (#186332) (#189019) # Backport This will backport the following commits from `main` to `8.15`: - [[ResponseOps] log error when ES Query rules find docs out of time range (#186332)](#186332) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Patrick Mueller","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-07-23T20:17:11Z","message":"[ResponseOps] log error when ES Query rules find docs out of time range (#186332)\n\nresolves https://github.com/elastic/kibana/issues/175980\r\n\r\n## Summary\r\n\r\nAdds a check with logging if an ES Query rule returns hits which fall\r\noutside the time range it's searching. This shouldn't ever happen, but\r\nseems to be happening on rare occaisons, so we wanted to add some\r\ndiagnostics to try to help narrow down the problem.\r\n\r\nNote that the ES|QL flavor rule does not use this diagnostic, just\r\nsearch source (KQL) and query dsl.\r\n\r\nWe check 3 things:\r\n- ensure the `dateStart` sent to fetch was valid\r\n- ensure the `dateEnd` sent to fetch was valid\r\n- ensure the relevant time fields in hits are within the\r\ndateStart/dateEnd range\r\n\r\nThese produce three different error messages:\r\n\r\n`For rule '<rule-id>', hits were returned with invalid time range start\r\ndate '<date>' from field '<field>' using query <query>`\r\n\r\n`For rule '<rule-id>', hits were returned with invalid time range end\r\ndate '<date>' from field '<field>' using query <query>`\r\n\r\n`For rule '<rule-id>', the hit with date '<date>' from field '<field>'\r\nis outside the query time range. Query: <query>. Document: <document>`\r\n\r\nEach message has one tag on it: `query-result-out-of-time-range`\r\n\r\n## To Verify\r\n\r\nTo test invalid dateStart/dateEnd, hack the Kibana code to set the\r\nvalues to NaN's:\r\n\r\n\r\nhttps://github.com/elastic/kibana/blob/d30da09707f85d84d7fd555733ba8e0cb595228b/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts#L263-L264\r\n\r\nFor instance, change that to:\r\n\r\n const epochStart = new Date('x').getTime();\r\n const epochEnd = new Date('y').getTime();\r\n\r\nTo test the invdivual document hits, first back out the change you made\r\nabove - when those error, the checks we're testing below do not run.\r\nHack the Kibana code to make the time out of range:\r\n\r\n\r\nhttps://github.com/elastic/kibana/blob/d30da09707f85d84d7fd555733ba8e0cb595228b/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts#L294\r\n\r\nFor instance, change that to:\r\n\r\n const epochDate = epochStart - 100\r\n\r\nFor both tests, create an es query rule - kql or dsl - make the relevant\r\nchanges, and arrange for the rule to get hits each time. The relevant\r\nmessages should be logged in the Kibana console when the rule runs.\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"e12e4496e0594d34509ef9235d1d7b7e3461e5d8","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Feature:Alerting","release_note:skip","Team:ResponseOps","Feature:Alerting/RulesFramework","backport:prev-minor","v8.16.0"],"number":186332,"url":"https://github.com/elastic/kibana/pull/186332","mergeCommit":{"message":"[ResponseOps] log error when ES Query rules find docs out of time range (#186332)\n\nresolves https://github.com/elastic/kibana/issues/175980\r\n\r\n## Summary\r\n\r\nAdds a check with logging if an ES Query rule returns hits which fall\r\noutside the time range it's searching. This shouldn't ever happen, but\r\nseems to be happening on rare occaisons, so we wanted to add some\r\ndiagnostics to try to help narrow down the problem.\r\n\r\nNote that the ES|QL flavor rule does not use this diagnostic, just\r\nsearch source (KQL) and query dsl.\r\n\r\nWe check 3 things:\r\n- ensure the `dateStart` sent to fetch was valid\r\n- ensure the `dateEnd` sent to fetch was valid\r\n- ensure the relevant time fields in hits are within the\r\ndateStart/dateEnd range\r\n\r\nThese produce three different error messages:\r\n\r\n`For rule '<rule-id>', hits were returned with invalid time range start\r\ndate '<date>' from field '<field>' using query <query>`\r\n\r\n`For rule '<rule-id>', hits were returned with invalid time range end\r\ndate '<date>' from field '<field>' using query <query>`\r\n\r\n`For rule '<rule-id>', the hit with date '<date>' from field '<field>'\r\nis outside the query time range. Query: <query>. Document: <document>`\r\n\r\nEach message has one tag on it: `query-result-out-of-time-range`\r\n\r\n## To Verify\r\n\r\nTo test invalid dateStart/dateEnd, hack the Kibana code to set the\r\nvalues to NaN's:\r\n\r\n\r\nhttps://github.com/elastic/kibana/blob/d30da09707f85d84d7fd555733ba8e0cb595228b/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts#L263-L264\r\n\r\nFor instance, change that to:\r\n\r\n const epochStart = new Date('x').getTime();\r\n const epochEnd = new Date('y').getTime();\r\n\r\nTo test the invdivual document hits, first back out the change you made\r\nabove - when those error, the checks we're testing below do not run.\r\nHack the Kibana code to make the time out of range:\r\n\r\n\r\nhttps://github.com/elastic/kibana/blob/d30da09707f85d84d7fd555733ba8e0cb595228b/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts#L294\r\n\r\nFor instance, change that to:\r\n\r\n const epochDate = epochStart - 100\r\n\r\nFor both tests, create an es query rule - kql or dsl - make the relevant\r\nchanges, and arrange for the rule to get hits each time. The relevant\r\nmessages should be logged in the Kibana console when the rule runs.\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"e12e4496e0594d34509ef9235d1d7b7e3461e5d8"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/186332","number":186332,"mergeCommit":{"message":"[ResponseOps] log error when ES Query rules find docs out of time range (#186332)\n\nresolves https://github.com/elastic/kibana/issues/175980\r\n\r\n## Summary\r\n\r\nAdds a check with logging if an ES Query rule returns hits which fall\r\noutside the time range it's searching. This shouldn't ever happen, but\r\nseems to be happening on rare occaisons, so we wanted to add some\r\ndiagnostics to try to help narrow down the problem.\r\n\r\nNote that the ES|QL flavor rule does not use this diagnostic, just\r\nsearch source (KQL) and query dsl.\r\n\r\nWe check 3 things:\r\n- ensure the `dateStart` sent to fetch was valid\r\n- ensure the `dateEnd` sent to fetch was valid\r\n- ensure the relevant time fields in hits are within the\r\ndateStart/dateEnd range\r\n\r\nThese produce three different error messages:\r\n\r\n`For rule '<rule-id>', hits were returned with invalid time range start\r\ndate '<date>' from field '<field>' using query <query>`\r\n\r\n`For rule '<rule-id>', hits were returned with invalid time range end\r\ndate '<date>' from field '<field>' using query <query>`\r\n\r\n`For rule '<rule-id>', the hit with date '<date>' from field '<field>'\r\nis outside the query time range. Query: <query>. Document: <document>`\r\n\r\nEach message has one tag on it: `query-result-out-of-time-range`\r\n\r\n## To Verify\r\n\r\nTo test invalid dateStart/dateEnd, hack the Kibana code to set the\r\nvalues to NaN's:\r\n\r\n\r\nhttps://github.com/elastic/kibana/blob/d30da09707f85d84d7fd555733ba8e0cb595228b/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts#L263-L264\r\n\r\nFor instance, change that to:\r\n\r\n const epochStart = new Date('x').getTime();\r\n const epochEnd = new Date('y').getTime();\r\n\r\nTo test the invdivual document hits, first back out the change you made\r\nabove - when those error, the checks we're testing below do not run.\r\nHack the Kibana code to make the time out of range:\r\n\r\n\r\nhttps://github.com/elastic/kibana/blob/d30da09707f85d84d7fd555733ba8e0cb595228b/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts#L294\r\n\r\nFor instance, change that to:\r\n\r\n const epochDate = epochStart - 100\r\n\r\nFor both tests, create an es query rule - kql or dsl - make the relevant\r\nchanges, and arrange for the rule to get hits each time. The relevant\r\nmessages should be logged in the Kibana console when the rule runs.\r\n\r\n### Checklist\r\n\r\nDelete any items that are not applicable to this PR.\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"e12e4496e0594d34509ef9235d1d7b7e3461e5d8"}}]}] BACKPORT-->
* master: (3487 commits) `BedrockChat` & `GeminiChat` (elastic#186809) [ResponseOps] log error when ES Query rules find docs out of time range (elastic#186332) skip flaky suite (elastic#188997) [Security solution][Alert Details] Enable preview feature flag and cypress tests (elastic#188580) [EuiProviders] Warn Developer if EuiProvider is missing (elastic#184608) [Security Solution ] Fixes Timeline infinite loading bug (elastic#188943) Improve SearchSource SearchRequest type (elastic#186862) Deprecate Search Sessions config (elastic#188037) [Synthetics] Add missing monitorType and tag info in cards !! (elastic#188824) [Console Monaco] Resolve uncaught error from tokenizer (elastic#188746) [Data Forge] Add `service.logs` dataset as a data stream (elastic#188786) [Console] Fix failing bulk requests (elastic#188552) Update dependency terser to ^5.31.2 (main) (elastic#188528) [APM][ECO] Telemetry (elastic#188627) [Fleet] Fix uninstall package validation accross space (elastic#188749) Update warning on `xpack.fleet.enableExperimental` (elastic#188917) [DOCS][Cases] Automate more screenshots for cases (elastic#188697) [Fleet] Fix get one agent when feature flag disabled (elastic#188953) chore(investigate): Add investigate-app plugin from poc (elastic#188122) [Monaco Editor] Add Search functionality (elastic#188337) ...
* master: (2400 commits) `BedrockChat` & `GeminiChat` (elastic#186809) [ResponseOps] log error when ES Query rules find docs out of time range (elastic#186332) skip flaky suite (elastic#188997) [Security solution][Alert Details] Enable preview feature flag and cypress tests (elastic#188580) [EuiProviders] Warn Developer if EuiProvider is missing (elastic#184608) [Security Solution ] Fixes Timeline infinite loading bug (elastic#188943) Improve SearchSource SearchRequest type (elastic#186862) Deprecate Search Sessions config (elastic#188037) [Synthetics] Add missing monitorType and tag info in cards !! (elastic#188824) [Console Monaco] Resolve uncaught error from tokenizer (elastic#188746) [Data Forge] Add `service.logs` dataset as a data stream (elastic#188786) [Console] Fix failing bulk requests (elastic#188552) Update dependency terser to ^5.31.2 (main) (elastic#188528) [APM][ECO] Telemetry (elastic#188627) [Fleet] Fix uninstall package validation accross space (elastic#188749) Update warning on `xpack.fleet.enableExperimental` (elastic#188917) [DOCS][Cases] Automate more screenshots for cases (elastic#188697) [Fleet] Fix get one agent when feature flag disabled (elastic#188953) chore(investigate): Add investigate-app plugin from poc (elastic#188122) [Monaco Editor] Add Search functionality (elastic#188337) ...
resolves #175980
Summary
Adds a check with logging if an ES Query rule returns hits which fall outside the time range it's searching. This shouldn't ever happen, but seems to be happening on rare occaisons, so we wanted to add some diagnostics to try to help narrow down the problem.
Note that the ES|QL flavor rule does not use this diagnostic, just search source (KQL) and query dsl.
We check 3 things:
dateStart
sent to fetch was validdateEnd
sent to fetch was validThese produce three different error messages:
For rule '<rule-id>', hits were returned with invalid time range start date '<date>' from field '<field>' using query <query>
For rule '<rule-id>', hits were returned with invalid time range end date '<date>' from field '<field>' using query <query>
For rule '<rule-id>', the hit with date '<date>' from field '<field>' is outside the query time range. Query: <query>. Document: <document>
Each message has one tag on it:
query-result-out-of-time-range
To Verify
To test invalid dateStart/dateEnd, hack the Kibana code to set the values to NaN's:
kibana/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts
Lines 263 to 264 in d30da09
For instance, change that to:
To test the invdivual document hits, first back out the change you made above - when those error, the checks we're testing below do not run. Hack the Kibana code to make the time out of range:
kibana/x-pack/plugins/stack_alerts/server/rule_types/es_query/executor.ts
Line 294 in d30da09
For instance, change that to:
For both tests, create an es query rule - kql or dsl - make the relevant changes, and arrange for the rule to get hits each time. The relevant messages should be logged in the Kibana console when the rule runs.
Checklist
Delete any items that are not applicable to this PR.