Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[8.15] [Detection Engine] Addresses Flakiness in ML FTR tests (#188155)…
… (#188259) # Backport This will backport the following commits from `main` to `8.15`: - [[Detection Engine] Addresses Flakiness in ML FTR tests (#188155)](#188155) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ryland Herrick","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-07-12T19:10:25Z","message":"[Detection Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n## Summary\r\n\r\nThe full chronicle of this endeavor can be found\r\n[here](#182183), but [this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes the identified issue:\r\n\r\n> I [finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe cause of these failures in the response to our \"setup modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n> <details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n> \r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\": \"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n> {\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n> \"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n> \"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\": 0,\r\n> \"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n> \"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n> \"datafeeds\": [\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\": true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false\r\n> },\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n> ],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> },\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n> }\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said issue by (relatively simply) retrying the\r\nfailed API call until it succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n- https://github.com/elastic/kibana/issues/171426\r\n- https://github.com/elastic/kibana/issues/187478\r\n- https://github.com/elastic/kibana/issues/187614\r\n- https://github.com/elastic/kibana/issues/182009\r\n- https://github.com/elastic/kibana/issues/171426\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [x] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n- [x] [Serverless Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n### For maintainers\r\n\r\n- [x] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b","branchLabelMapping":{"^v8.16.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:skip","Feature:Detection Rules","Feature:ML Rule","Feature:Security ML Jobs","Feature:Rule Creation","Team:Detection Engine","Feature:Rule Edit","v8.16.0"],"number":188155,"url":"https://github.com/elastic/kibana/pull/188155","mergeCommit":{"message":"[Detection Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n## Summary\r\n\r\nThe full chronicle of this endeavor can be found\r\n[here](#182183), but [this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes the identified issue:\r\n\r\n> I [finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe cause of these failures in the response to our \"setup modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n> <details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n> \r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\": \"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n> {\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n> \"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n> \"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\": 0,\r\n> \"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n> \"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n> \"datafeeds\": [\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\": true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false\r\n> },\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n> ],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> },\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n> }\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said issue by (relatively simply) retrying the\r\nfailed API call until it succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n- https://github.com/elastic/kibana/issues/171426\r\n- https://github.com/elastic/kibana/issues/187478\r\n- https://github.com/elastic/kibana/issues/187614\r\n- https://github.com/elastic/kibana/issues/182009\r\n- https://github.com/elastic/kibana/issues/171426\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [x] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n- [x] [Serverless Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n### For maintainers\r\n\r\n- [x] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/188155","number":188155,"mergeCommit":{"message":"[Detection Engine] Addresses Flakiness in ML FTR tests (#188155)\n\n## Summary\r\n\r\nThe full chronicle of this endeavor can be found\r\n[here](#182183), but [this\r\ncomment](https://github.com/elastic/kibana/pull/182183#issuecomment-2221517519)\r\nsummarizes the identified issue:\r\n\r\n> I [finally\r\nfound](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6516#01909dde-a3e8-4e47-b255-b1ff7cac8f8d/6-2368)\r\nthe cause of these failures in the response to our \"setup modules\"\r\nrequest to ML. Attaching here for posterity:\r\n>\r\n> <details>\r\n> <summary>Setup Modules Failure Response</summary>\r\n> \r\n> ```json\r\n> {\r\n> \"jobs\": [\r\n> { \"id\": \"v3_linux_anomalous_network_port_activity\", \"success\": true },\r\n> {\r\n> \"id\": \"v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> ],\r\n> \"type\": \"search_phase_execution_exception\",\r\n> \"reason\": \"all shards failed\",\r\n> \"phase\": \"query\",\r\n> \"grouped\": true,\r\n> \"failed_shards\": [\r\n> {\r\n> \"shard\": 0,\r\n> \"index\":\r\n\".ml-anomalies-custom-v3_linux_network_configuration_discovery\",\r\n> \"node\": \"dKzpvp06ScO0OxqHilETEA\",\r\n> \"reason\": {\r\n> \"type\": \"no_shard_available_action_exception\",\r\n> \"reason\":\r\n\"[ftr][127.0.0.1:9300][indices:data/read/search[phase/query]]\"\r\n> }\r\n> }\r\n> ]\r\n> },\r\n> \"status\": 503\r\n> }\r\n> }\r\n> ],\r\n> \"datafeeds\": [\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_port_activity\",\r\n> \"success\": true,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false\r\n> },\r\n> {\r\n> \"id\": \"datafeed-v3_linux_anomalous_network_activity\",\r\n> \"success\": false,\r\n> \"started\": false,\r\n> \"awaitingMlNodeAllocation\": false,\r\n> \"error\": {\r\n> \"error\": {\r\n> \"root_cause\": [\r\n> {\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> }\r\n> ],\r\n> \"type\": \"resource_not_found_exception\",\r\n> \"reason\": \"No known job with id 'v3_linux_anomalous_network_activity'\"\r\n> },\r\n> \"status\": 404\r\n> }\r\n> }\r\n> ],\r\n> \"kibana\": {}\r\n> }\r\n> \r\n> ```\r\n> </details>\r\n\r\nThis branch, then, fixes said issue by (relatively simply) retrying the\r\nfailed API call until it succeeds.\r\n\r\n### Related Issues\r\nAddresses:\r\n- https://github.com/elastic/kibana/issues/171426\r\n- https://github.com/elastic/kibana/issues/187478\r\n- https://github.com/elastic/kibana/issues/187614\r\n- https://github.com/elastic/kibana/issues/182009\r\n- https://github.com/elastic/kibana/issues/171426\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n- [x] [Flaky Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was\r\nused on any tests changed\r\n- [x] [ESS Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6528)\r\n- [x] [Serverless Rule Execution FTR x\r\n200](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/6529)\r\n\r\n\r\n### For maintainers\r\n\r\n- [x] This was checked for breaking API changes and was [labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)","sha":"3df635ef4a8c86c41c91ac5f59198a9b67d1dc8b"}}]}] BACKPORT-->
- Loading branch information