Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve task manager functional tests in preperation for mget task claimer being the default #196399

Merged
merged 13 commits into from
Oct 21, 2024

Conversation

mikecote
Copy link
Contributor

@mikecote mikecote commented Oct 15, 2024

Resolves #184942
Resolves #192023
Resolves #195573

In this PR, I'm improving the flakiness found in our functional tests in preperation for mget being the default task claimer that all these tests run with (#194625). Because the mget task claimer works differently and also polls more frequently, we end-up in situations where tasks run faster than they were with update_by_query, creating more race conditions that are now fixed in this PR.

Issues were surfaced via #190148 where I set mget as the default task claiming strategy.

Flaky test runs (some of these failed on other tests that are flaky):

@mikecote mikecote added release_note:skip Skip the PR/issue when compiling release notes Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v9.0.0 backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) v8.17.0 labels Oct 15, 2024
OneOfTaskTypes('task.taskType', claimPartitions.unlimitedTypes),
OneOfTaskTypes(
'task.taskType',
claimPartitions.unlimitedTypes.concat(Array.from(removedTypes))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test to ensure removedTypes get marked as unrecognized, this should fix it.

Comment on lines +147 to +156
const result = await this.es.search(params, { meta: true });
result.body.hits.hits = result.body.hits.hits.map((hit) => {
return {
...hit,
// Easier to remove @timestamp than to have all the downstream code ignore it
// in their assertions
_source: omit(hit._source as Record<string, unknown>, '@timestamp'),
};
});
return result;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a rare case, we receive the document we expect 2nd in the list instead of first. Having a sort on timestamp will make sure it's consistent.

@@ -125,7 +125,7 @@ export default function apiKeyBackfillTests({ getService }: FtrProviderContext)
}

it('should wait to invalidate API key until backfill for rule is complete', async () => {
const start = moment().utc().startOf('day').subtract(7, 'days').toISOString();
const start = moment().utc().startOf('day').subtract(13, 'days').toISOString();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backfill jobs sometimes ran too fast, adding more back days will compensate for this so we still have the backfill running while doing assertions on the API keys to remove.

Comment on lines +113 to +119
await retry.try(async () => {
const { status, body: rule } = await supertest.get(
`${getUrlPrefix(Spaces.space1.id)}/api/alerting/rule/${ruleId}`
);
expect(status).to.eql(200);
expect(rule.execution_status.status).to.eql('active');
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the event log documents are persisted before the rule finishes running and update its own status, causing this code to expect active while the rule is not updated yet.

@@ -797,7 +797,7 @@ export default function ({ getService }: FtrProviderContext) {
await retry.try(async () => {
const [scheduledTask] = (await currentTasks()).docs;
expect(scheduledTask.id).to.eql(task.id);
expect(scheduledTask.status).to.eql('claiming');
expect(['claiming', 'running'].includes(scheduledTask.status)).to.be(true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a claiming phase in mget, allowing running as well.

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7151

[✅] x-pack/test/alerting_api_integration/observability/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config_non_dedicated_task_runner.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config_with_schedule_circuit_breaker.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config_non_dedicated_task_runner.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/action_task_params/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/basic/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group1/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group3/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group2/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/actions/config.ts: 10/10 tests passed.
[❌] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 0/10 tests passed.
[❌] x-pack/test/alerting_api_integration/security_and_spaces/group1/config.ts: 0/10 tests passed.
[❌] x-pack/test/task_manager_claimer_mget/config.ts: 4/10 tests passed.
[❌] x-pack/test/plugin_api_integration/config.ts: 4/10 tests passed.

see run history

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7169

[✅] x-pack/test/alerting_api_integration/security_and_spaces/group1/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/observability/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config_with_schedule_circuit_breaker.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config_non_dedicated_task_runner.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group1/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/action_task_params/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group2/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/actions/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config_non_dedicated_task_runner.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group3/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 10/10 tests passed.
[✅] x-pack/test/alerting_api_integration/basic/config.ts: 10/10 tests passed.
[❌] x-pack/test/task_manager_claimer_mget/config.ts: 8/10 tests passed.
[❌] x-pack/test/plugin_api_integration/config.ts: 0/10 tests passed.

see run history

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7172

[❌] x-pack/test/task_manager_claimer_mget/config.ts: 23/25 tests passed.
[❌] x-pack/test/plugin_api_integration/config.ts: 2/25 tests passed.

see run history

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7175

[✅] x-pack/test/task_manager_claimer_mget/config.ts: 25/25 tests passed.
[✅] x-pack/test/plugin_api_integration/config.ts: 25/25 tests passed.

see run history

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#7176

[✅] x-pack/test/alerting_api_integration/basic/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config_with_schedule_circuit_breaker.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group3/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group2/config_non_dedicated_task_runner.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config_non_dedicated_task_runner.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/action_task_params/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/security_and_spaces/group4/config.ts: 12/12 tests passed.
[❌] x-pack/test/alerting_api_integration/security_and_spaces/group1/config.ts: 0/12 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group2/config.ts: 12/12 tests passed.
[❌] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group3/config.ts: 0/12 tests passed.
[❌] x-pack/test/alerting_api_integration/spaces_only/tests/actions/config.ts: 10/12 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/observability/config.ts: 12/12 tests passed.
[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group1/config.ts: 12/12 tests passed.

see run history

@mikecote mikecote marked this pull request as ready for review October 18, 2024 16:44
@mikecote mikecote requested a review from a team as a code owner October 18, 2024 16:44
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7185

[✅] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 100/100 tests passed.

see run history

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mikecote mikecote enabled auto-merge (squash) October 21, 2024 12:34
@mikecote mikecote merged commit 3b8cf12 into elastic:main Oct 21, 2024
42 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11440714448

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #27 / Alerting alerts_as_data alerts as data flapping should allow rule specific flapping to override space flapping

Metrics [docs]

✅ unchanged

History

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.x Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 196399

Questions ?

Please refer to the Backport tool documentation

@mikecote
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

mikecote added a commit to mikecote/kibana that referenced this pull request Oct 21, 2024
…aimer being the default (elastic#196399)

Resolves elastic#184942
Resolves elastic#192023
Resolves elastic#195573

In this PR, I'm improving the flakiness found in our functional tests in
preperation for mget being the default task claimer that all these tests
run with (elastic#194625). Because the
mget task claimer works differently and also polls more frequently, we
end-up in situations where tasks run faster than they were with
update_by_query, creating more race conditions that are now fixed in
this PR.

Issues were surfaced via elastic#190148
where I set `mget` as the default task claiming strategy.

Flaky test runs (some of these failed on other tests that are flaky):
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185
(for
elastic@0fcf1ae)

(cherry picked from commit 3b8cf12)

# Conflicts:
#	x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/alerts_as_data/alerts_as_data_flapping.ts
mikecote added a commit that referenced this pull request Oct 21, 2024
…ask claimer being the default (#196399) (#197062)

# Backport

This will backport the following commits from `main` to `8.x`:
- [Improve task manager functional tests in preperation for mget task
claimer being the default
(#196399)](#196399)

<!--- Backport version: 8.9.8 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Mike
Côté","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-21T13:02:59Z","message":"Improve
task manager functional tests in preperation for mget task claimer being
the default (#196399)\n\nResolves
https://github.com/elastic/kibana/issues/184942\r\nResolves
https://github.com/elastic/kibana/issues/192023\r\nResolves
https://github.com/elastic/kibana/issues/195573\r\n\r\nIn this PR, I'm
improving the flakiness found in our functional tests in\r\npreperation
for mget being the default task claimer that all these tests\r\nrun with
(#194625). Because the\r\nmget
task claimer works differently and also polls more frequently,
we\r\nend-up in situations where tasks run faster than they were
with\r\nupdate_by_query, creating more race conditions that are now
fixed in\r\nthis PR.\r\n\r\nIssues were surfaced via
https://github.com/elastic/kibana/pull/190148\r\nwhere I set `mget` as
the default task claiming strategy.\r\n\r\nFlaky test runs (some of
these failed on other tests that are
flaky):\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185\r\n(for\r\nhttps://github.com//pull/196399/commits/0fcf1ae68927277a8f544278903edbf5912a1649)","sha":"3b8cf1236b1b6ba67862f35f47fcb250d88ac4c0","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task
Manager","Team:ResponseOps","v9.0.0","backport:prev-minor","v8.17.0"],"number":196399,"url":"https://github.com/elastic/kibana/pull/196399","mergeCommit":{"message":"Improve
task manager functional tests in preperation for mget task claimer being
the default (#196399)\n\nResolves
https://github.com/elastic/kibana/issues/184942\r\nResolves
https://github.com/elastic/kibana/issues/192023\r\nResolves
https://github.com/elastic/kibana/issues/195573\r\n\r\nIn this PR, I'm
improving the flakiness found in our functional tests in\r\npreperation
for mget being the default task claimer that all these tests\r\nrun with
(#194625). Because the\r\nmget
task claimer works differently and also polls more frequently,
we\r\nend-up in situations where tasks run faster than they were
with\r\nupdate_by_query, creating more race conditions that are now
fixed in\r\nthis PR.\r\n\r\nIssues were surfaced via
https://github.com/elastic/kibana/pull/190148\r\nwhere I set `mget` as
the default task claiming strategy.\r\n\r\nFlaky test runs (some of
these failed on other tests that are
flaky):\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185\r\n(for\r\nhttps://github.com//pull/196399/commits/0fcf1ae68927277a8f544278903edbf5912a1649)","sha":"3b8cf1236b1b6ba67862f35f47fcb250d88ac4c0"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/196399","number":196399,"mergeCommit":{"message":"Improve
task manager functional tests in preperation for mget task claimer being
the default (#196399)\n\nResolves
https://github.com/elastic/kibana/issues/184942\r\nResolves
https://github.com/elastic/kibana/issues/192023\r\nResolves
https://github.com/elastic/kibana/issues/195573\r\n\r\nIn this PR, I'm
improving the flakiness found in our functional tests in\r\npreperation
for mget being the default task claimer that all these tests\r\nrun with
(#194625). Because the\r\nmget
task claimer works differently and also polls more frequently,
we\r\nend-up in situations where tasks run faster than they were
with\r\nupdate_by_query, creating more race conditions that are now
fixed in\r\nthis PR.\r\n\r\nIssues were surfaced via
https://github.com/elastic/kibana/pull/190148\r\nwhere I set `mget` as
the default task claiming strategy.\r\n\r\nFlaky test runs (some of
these failed on other tests that are
flaky):\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176\r\n-\r\nhttps://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185\r\n(for\r\nhttps://github.com//pull/196399/commits/0fcf1ae68927277a8f544278903edbf5912a1649)","sha":"3b8cf1236b1b6ba67862f35f47fcb250d88ac4c0"}},{"branch":"8.x","label":"v8.17.0","labelRegex":"^v8.17.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment