Improve task manager functional tests in preperation for mget task claimer being the default #196399

mikecote · 2024-10-15T17:41:13Z

Resolves #184942
Resolves #192023
Resolves #195573

In this PR, I'm improving the flakiness found in our functional tests in preperation for mget being the default task claimer that all these tests run with (#194625). Because the mget task claimer works differently and also polls more frequently, we end-up in situations where tasks run faster than they were with update_by_query, creating more race conditions that are now fixed in this PR.

Issues were surfaced via #190148 where I set mget as the default task claiming strategy.

Flaky test runs (some of these failed on other tests that are flaky):

mikecote · 2024-10-15T17:47:20Z

x-pack/plugins/task_manager/server/task_claimers/strategy_mget.ts

-      OneOfTaskTypes('task.taskType', claimPartitions.unlimitedTypes),
+      OneOfTaskTypes(
+        'task.taskType',
+        claimPartitions.unlimitedTypes.concat(Array.from(removedTypes))


We have a test to ensure removedTypes get marked as unrecognized, this should fix it.

mikecote · 2024-10-15T17:48:22Z

x-pack/test/alerting_api_integration/packages/helpers/es_test_index_tool.ts

+    const result = await this.es.search(params, { meta: true });
+    result.body.hits.hits = result.body.hits.hits.map((hit) => {
+      return {
+        ...hit,
+        // Easier to remove @timestamp than to have all the downstream code ignore it
+        // in their assertions
+        _source: omit(hit._source as Record<string, unknown>, '@timestamp'),
+      };
+    });
+    return result;


In a rare case, we receive the document we expect 2nd in the list instead of first. Having a sort on timestamp will make sure it's consistent.

mikecote · 2024-10-15T17:49:00Z

.../test/alerting_api_integration/security_and_spaces/group1/tests/alerting/backfill/api_key.ts

@@ -125,7 +125,7 @@ export default function apiKeyBackfillTests({ getService }: FtrProviderContext)
    }

    it('should wait to invalidate API key until backfill for rule is complete', async () => {
-      const start = moment().utc().startOf('day').subtract(7, 'days').toISOString();
+      const start = moment().utc().startOf('day').subtract(13, 'days').toISOString();


Backfill jobs sometimes ran too fast, adding more back days will compensate for this so we still have the backfill running while doing assertions on the API keys to remove.

mikecote · 2024-10-15T17:50:25Z

...g_api_integration/spaces_only/tests/alerting/group4/builtin_alert_types/long_running/rule.ts

+      await retry.try(async () => {
+        const { status, body: rule } = await supertest.get(
+          `${getUrlPrefix(Spaces.space1.id)}/api/alerting/rule/${ruleId}`
+        );
+        expect(status).to.eql(200);
+        expect(rule.execution_status.status).to.eql('active');
+      });


Sometimes the event log documents are persisted before the rule finishes running and update its own status, causing this code to expect active while the rule is not updated yet.

mikecote · 2024-10-15T17:50:54Z

x-pack/test/plugin_api_integration/test_suites/task_manager/task_management.ts

@@ -797,7 +797,7 @@ export default function ({ getService }: FtrProviderContext) {
      await retry.try(async () => {
        const [scheduledTask] = (await currentTasks()).docs;
        expect(scheduledTask.id).to.eql(task.id);
-        expect(scheduledTask.status).to.eql('claiming');
+        expect(['claiming', 'running'].includes(scheduledTask.status)).to.be(true);


We don't have a claiming phase in mget, allowing running as well.

kibanamachine · 2024-10-16T15:31:41Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7151

[✅] x-pack/test/alerting_api_integration/observability/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config_non_dedicated_task_runner.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config_with_schedule_circuit_breaker.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config_non_dedicated_task_runner.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/action_task_params/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/basic/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group1/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group3/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group2/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/actions/config.ts: 10/10 tests passed.
[❌] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 0/10 tests passed.
[❌] x-pack/test/alerting_api_integration/security_and_spaces/group1/config.ts: 0/10 tests passed.
[❌] x-pack/test/task_manager_claimer_mget/config.ts: 4/10 tests passed.
[❌] x-pack/test/plugin_api_integration/config.ts: 4/10 tests passed.

see run history

…ecote/kibana into task-manager/flaky-test-improvements

kibanamachine · 2024-10-17T20:19:19Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7169

[✅] x-pack/test/alerting_api_integration/security_and_spaces/group1/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/observability/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config_with_schedule_circuit_breaker.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config_non_dedicated_task_runner.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group1/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/action_task_params/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group2/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/actions/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config_non_dedicated_task_runner.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group3/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/basic/config.ts: 10/10 tests passed.
[❌] x-pack/test/task_manager_claimer_mget/config.ts: 8/10 tests passed.
[❌] x-pack/test/plugin_api_integration/config.ts: 0/10 tests passed.

see run history

kibanamachine · 2024-10-17T23:03:51Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7172

[❌] x-pack/test/task_manager_claimer_mget/config.ts: 23/25 tests passed.
[❌] x-pack/test/plugin_api_integration/config.ts: 2/25 tests passed.

see run history

kibanamachine · 2024-10-18T12:28:30Z

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7175

[✅] x-pack/test/task_manager_claimer_mget/config.ts: 25/25 tests passed.
[✅] x-pack/test/plugin_api_integration/config.ts: 25/25 tests passed.

see run history

kibanamachine · 2024-10-18T16:17:24Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7176

[✅] x-pack/test/alerting_api_integration/basic/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config_with_schedule_circuit_breaker.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config_non_dedicated_task_runner.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config_non_dedicated_task_runner.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/action_task_params/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config.ts: 12/12 tests passed.
[❌] x-pack/test/alerting_api_integration/security_and_spaces/group1/config.ts: 0/12 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group2/config.ts: 12/12 tests passed.
[❌] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group3/config.ts: 0/12 tests passed.
[❌] x-pack/test/alerting_api_integration/spaces_only/tests/actions/config.ts: 10/12 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/observability/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group1/config.ts: 12/12 tests passed.

see run history

elasticmachine · 2024-10-18T16:44:45Z

Pinging @elastic/response-ops (Team:ResponseOps)

kibanamachine · 2024-10-18T19:17:11Z

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7185

[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 100/100 tests passed.

see run history

ymao1

LGTM

kibanamachine · 2024-10-21T13:03:18Z

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11440714448

elasticmachine · 2024-10-21T13:03:42Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: e2988a2

Failed CI Steps

FTR Configs #27

Test Failures

[job] [logs] FTR Configs #27 / Alerting alerts_as_data alerts as data flapping should allow rule specific flapping to override space flapping

Metrics [docs]

✅ unchanged

History

💚 Build #243990 succeeded 0fcf1ae
💔 Build #243855 failed a2b5f30
💚 Build #243616 succeeded 7e9a539
💔 Build #243043 failed d0a492e
💔 Build #242811 failed 078a3c6
💔 Build #242787 failed ff57457

kibanamachine · 2024-10-21T13:08:01Z

💔 All backports failed

Status	Branch	Result
❌	8.x	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 196399

Questions ?

Please refer to the Backport tool documentation

mikecote · 2024-10-21T13:38:37Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

…aimer being the default (elastic#196399) Resolves elastic#184942 Resolves elastic#192023 Resolves elastic#195573 In this PR, I'm improving the flakiness found in our functional tests in preperation for mget being the default task claimer that all these tests run with (elastic#194625). Because the mget task claimer works differently and also polls more frequently, we end-up in situations where tasks run faster than they were with update_by_query, creating more race conditions that are now fixed in this PR. Issues were surfaced via elastic#190148 where I set `mget` as the default task claiming strategy. Flaky test runs (some of these failed on other tests that are flaky): - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185 (for elastic@0fcf1ae) (cherry picked from commit 3b8cf12) # Conflicts: # x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/alerts_as_data/alerts_as_data_flapping.ts

…ask claimer being the default (#196399) (#197062) # Backport This will backport the following commits from `main` to `8.x`: - [Improve task manager functional tests in preperation for mget task claimer being the default (#196399)](#196399)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)

Initial commit

ff57457

mikecote commented Oct 15, 2024

View reviewed changes

mikecote added 2 commits October 15, 2024 14:36

Merge branch 'main' into task-manager/flaky-test-improvements

078a3c6

Merge branch 'main' into task-manager/flaky-test-improvements

d0a492e

mikecote mentioned this pull request Oct 16, 2024

[ResponseOps][Task Manager] unrecognized tasks are not updated when mget task claimer is used #192686

Closed

2 tasks

mikecote and others added 3 commits October 17, 2024 12:47

Merge branch 'main' into task-manager/flaky-test-improvements

0ca7458

More improvements

aa13ec6

Merge branch 'task-manager/flaky-test-improvements' of github.com:mik…

7e9a539

…ecote/kibana into task-manager/flaky-test-improvements

Add temporary log

64e5871

mikecote and others added 3 commits October 18, 2024 07:06

More improvements

6955117

Remove debug log

bcb0da7

Merge branch 'main' into task-manager/flaky-test-improvements

a2b5f30

Fix type check

49eb9e0

mikecote marked this pull request as ready for review October 18, 2024 16:44

mikecote requested a review from a team as a code owner October 18, 2024 16:44

Unskip flaky test

0fcf1ae

Merge branch 'main' into task-manager/flaky-test-improvements

e2988a2

ymao1 approved these changes Oct 21, 2024

View reviewed changes

mikecote enabled auto-merge (squash) October 21, 2024 12:34

mikecote merged commit 3b8cf12 into elastic:main Oct 21, 2024
42 checks passed

mikecote mentioned this pull request Oct 21, 2024

[8.x] Improve task manager functional tests in preperation for mget task claimer being the default (#196399) #197062

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve task manager functional tests in preperation for mget task claimer being the default #196399

Improve task manager functional tests in preperation for mget task claimer being the default #196399

mikecote commented Oct 15, 2024 •

edited by kibanamachine

Loading

mikecote Oct 15, 2024

mikecote Oct 15, 2024

mikecote Oct 15, 2024

mikecote Oct 15, 2024

mikecote Oct 15, 2024

kibanamachine commented Oct 16, 2024

kibanamachine commented Oct 17, 2024

kibanamachine commented Oct 17, 2024

kibanamachine commented Oct 18, 2024

kibanamachine commented Oct 18, 2024

elasticmachine commented Oct 18, 2024

kibanamachine commented Oct 18, 2024

ymao1 left a comment

kibanamachine commented Oct 21, 2024

elasticmachine commented Oct 21, 2024

kibanamachine commented Oct 21, 2024

mikecote commented Oct 21, 2024

Improve task manager functional tests in preperation for mget task claimer being the default #196399

Improve task manager functional tests in preperation for mget task claimer being the default #196399

Conversation

mikecote commented Oct 15, 2024 • edited by kibanamachine Loading

mikecote Oct 15, 2024

Choose a reason for hiding this comment

mikecote Oct 15, 2024

Choose a reason for hiding this comment

mikecote Oct 15, 2024

Choose a reason for hiding this comment

mikecote Oct 15, 2024

Choose a reason for hiding this comment

mikecote Oct 15, 2024

Choose a reason for hiding this comment

kibanamachine commented Oct 16, 2024

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7151

kibanamachine commented Oct 17, 2024

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7169

kibanamachine commented Oct 17, 2024

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7172

kibanamachine commented Oct 18, 2024

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7175

kibanamachine commented Oct 18, 2024

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7176

elasticmachine commented Oct 18, 2024

kibanamachine commented Oct 18, 2024

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7185

ymao1 left a comment

Choose a reason for hiding this comment

kibanamachine commented Oct 21, 2024

elasticmachine commented Oct 21, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

History

kibanamachine commented Oct 21, 2024

💔 All backports failed

Manual backport

Questions ?

mikecote commented Oct 21, 2024

💚 All backports created successfully

Questions ?

mikecote commented Oct 15, 2024 •

edited by kibanamachine

Loading