Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Task Manager] Adding integration test to ensure no WorkloadAggregator errors when there are unrecognized task types. #193479

Merged
merged 5 commits into from
Sep 25, 2024

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Sep 19, 2024

Fixes https://github.com/elastic/kibana-team/issues/1036

Summary

Adding integration test as RCA action for incident where unrecognized task types was causing issues generating the workload portion of the task manager health report.

To verify

Add this line to your code to that will throw an error when there are unrecognized task types when generating the health report

--- a/x-pack/plugins/task_manager/server/task_type_dictionary.ts
+++ b/x-pack/plugins/task_manager/server/task_type_dictionary.ts
@@ -128,6 +128,7 @@ export class TaskTypeDictionary {
   }

   public get(type: string): TaskDefinition | undefined {
+    this.ensureHas(type);
     return this.definitions.get(type);
   }

Run the integration test node scripts/jest_integration.js x-pack/plugins/task_manager/server/integration_tests/removed_types.test.ts and see that it fails because a WorkloadAggregator error is logged.

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6983

[❌] x-pack/test/plugin_api_integration/config.ts: 13/100 tests passed.

see run history

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6989

[❌] x-pack/test/plugin_api_integration/config.ts: 12/100 tests passed.

see run history

@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#6991

[❌] x-pack/test/plugin_api_integration/config.ts: 14/100 tests passed.

see run history

@ymao1 ymao1 changed the title Adding check for workload values [Response Ops][Task Manager] Adding integration test to ensure no WorkloadAggregator errors when there are unrecognized task types. Sep 23, 2024
@ymao1 ymao1 self-assigned this Sep 23, 2024
@ymao1 ymao1 added release_note:skip Skip the PR/issue when compiling release notes Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v9.0.0 v8.16.0 labels Sep 23, 2024
@ymao1 ymao1 marked this pull request as ready for review September 23, 2024 19:07
@ymao1 ymao1 requested a review from a team as a code owner September 23, 2024 19:07
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1 ymao1 added the backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) label Sep 23, 2024
@ymao1 ymao1 requested review from pmuellr and guskovaue September 23, 2024 19:08
@pmuellr
Copy link
Member

pmuellr commented Sep 24, 2024

Should we run the flaky tests again? Didn't see any running right now - or maybe you were using those for some research?

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a note about changing the task claimer :-)

const setupResult = await setupTestServers({
xpack: {
task_manager: {
claim_strategy: `update_by_query`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm ... should we test both claimers? I think for the test this is ultimately making, it's not needed.

But I was curious about hard-coding the claim_strategy, and I did notice a lot of usage of claim_strategy: 'default' in the code (search on claim_strategy), which is presumably logging an error once, and then returning "the default". :-)

export function getTaskClaimer(logger: Logger, strategy: string): TaskClaimerFn {
switch (strategy) {
case CLAIM_STRATEGY_UPDATE_BY_QUERY:
return claimAvailableTasksUpdateByQuery;
case CLAIM_STRATEGY_MGET:
return claimAvailableTasksMget;
}
if (!WarnedOnInvalidClaimer) {
WarnedOnInvalidClaimer = true;
logger.warn(`Unknown task claiming strategy "${strategy}", falling back to update_by_query`);
}
return claimAvailableTasksUpdateByQuery;
}

So, thinking maybe change this to default - if testing with one is fine. That way it'll track the default we use, when we switch the default to mget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can remove the hard-coded claimer and it will just use whatever the default is, whenever we switch it, it'll test the default? That makes sense to me.

I can create a followup issue for the default claim strategies. Probably an oversight from when we switched from default to update_by_query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all the claim_strategy: 'default' to update_by_query in this commit: a58040c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the hard-coded claim strategy in the jest integration test in this commit: f463ff7

@ymao1
Copy link
Contributor Author

ymao1 commented Sep 24, 2024

Should we run the flaky tests again? Didn't see any running right now - or maybe you were using those for some research?

I originally tried updating a functional test and was running the flaky test runner on that FT but abandoned that approach in favor of a jest integration test.

Copy link
Contributor

@guskovaue guskovaue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ymao1
Copy link
Contributor Author

ymao1 commented Sep 25, 2024

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit 01eae15 into elastic:main Sep 25, 2024
38 checks passed
@ymao1 ymao1 deleted the rca-issue-1036 branch September 25, 2024 14:22
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 25, 2024
…rkloadAggregator` errors when there are unrecognized task types. (elastic#193479)

Fixes elastic/kibana-team#1036

## Summary

Adding integration test as RCA action for incident where unrecognized
task types was causing issues generating the workload portion of the
task manager health report.

## To verify

Add this line to your code to that will throw an error when there are
unrecognized task types when generating the health report

```
--- a/x-pack/plugins/task_manager/server/task_type_dictionary.ts
+++ b/x-pack/plugins/task_manager/server/task_type_dictionary.ts
@@ -128,6 +128,7 @@ export class TaskTypeDictionary {
   }

   public get(type: string): TaskDefinition | undefined {
+    this.ensureHas(type);
     return this.definitions.get(type);
   }
```

Run the integration test `node scripts/jest_integration.js
x-pack/plugins/task_manager/server/integration_tests/removed_types.test.ts`
and see that it fails because a `WorkloadAggregator` error is logged.

---------

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 01eae15)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Sep 25, 2024
…no &#x60;WorkloadAggregator&#x60; errors when there are unrecognized task types. (#193479) (#194016)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Response Ops][Task Manager] Adding integration test to ensure no
&#x60;WorkloadAggregator&#x60; errors when there are unrecognized task
types. (#193479)](#193479)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ying
Mao","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-09-25T14:22:11Z","message":"[Response
Ops][Task Manager] Adding integration test to ensure no
`WorkloadAggregator` errors when there are unrecognized task types.
(#193479)\n\nFixes
https://github.com/elastic/kibana-team/issues/1036\r\n\r\n##
Summary\r\n\r\nAdding integration test as RCA action for incident where
unrecognized\r\ntask types was causing issues generating the workload
portion of the\r\ntask manager health report.\r\n\r\n## To
verify\r\n\r\nAdd this line to your code to that will throw an error
when there are\r\nunrecognized task types when generating the health
report\r\n\r\n```\r\n---
a/x-pack/plugins/task_manager/server/task_type_dictionary.ts\r\n+++
b/x-pack/plugins/task_manager/server/task_type_dictionary.ts\r\n@@
-128,6 +128,7 @@ export class TaskTypeDictionary {\r\n }\r\n\r\n public
get(type: string): TaskDefinition | undefined {\r\n+
this.ensureHas(type);\r\n return this.definitions.get(type);\r\n
}\r\n```\r\n\r\nRun the integration test `node
scripts/jest_integration.js\r\nx-pack/plugins/task_manager/server/integration_tests/removed_types.test.ts`\r\nand
see that it fails because a `WorkloadAggregator` error is
logged.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<[email protected]>","sha":"01eae1556266c8377f6557f4ccacc53e0b4db7fc","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task
Manager","Team:ResponseOps","v9.0.0","backport:prev-minor","v8.16.0"],"title":"[Response
Ops][Task Manager] Adding integration test to ensure no
`WorkloadAggregator` errors when there are unrecognized task
types.","number":193479,"url":"https://github.com/elastic/kibana/pull/193479","mergeCommit":{"message":"[Response
Ops][Task Manager] Adding integration test to ensure no
`WorkloadAggregator` errors when there are unrecognized task types.
(#193479)\n\nFixes
https://github.com/elastic/kibana-team/issues/1036\r\n\r\n##
Summary\r\n\r\nAdding integration test as RCA action for incident where
unrecognized\r\ntask types was causing issues generating the workload
portion of the\r\ntask manager health report.\r\n\r\n## To
verify\r\n\r\nAdd this line to your code to that will throw an error
when there are\r\nunrecognized task types when generating the health
report\r\n\r\n```\r\n---
a/x-pack/plugins/task_manager/server/task_type_dictionary.ts\r\n+++
b/x-pack/plugins/task_manager/server/task_type_dictionary.ts\r\n@@
-128,6 +128,7 @@ export class TaskTypeDictionary {\r\n }\r\n\r\n public
get(type: string): TaskDefinition | undefined {\r\n+
this.ensureHas(type);\r\n return this.definitions.get(type);\r\n
}\r\n```\r\n\r\nRun the integration test `node
scripts/jest_integration.js\r\nx-pack/plugins/task_manager/server/integration_tests/removed_types.test.ts`\r\nand
see that it fails because a `WorkloadAggregator` error is
logged.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<[email protected]>","sha":"01eae1556266c8377f6557f4ccacc53e0b4db7fc"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/193479","number":193479,"mergeCommit":{"message":"[Response
Ops][Task Manager] Adding integration test to ensure no
`WorkloadAggregator` errors when there are unrecognized task types.
(#193479)\n\nFixes
https://github.com/elastic/kibana-team/issues/1036\r\n\r\n##
Summary\r\n\r\nAdding integration test as RCA action for incident where
unrecognized\r\ntask types was causing issues generating the workload
portion of the\r\ntask manager health report.\r\n\r\n## To
verify\r\n\r\nAdd this line to your code to that will throw an error
when there are\r\nunrecognized task types when generating the health
report\r\n\r\n```\r\n---
a/x-pack/plugins/task_manager/server/task_type_dictionary.ts\r\n+++
b/x-pack/plugins/task_manager/server/task_type_dictionary.ts\r\n@@
-128,6 +128,7 @@ export class TaskTypeDictionary {\r\n }\r\n\r\n public
get(type: string): TaskDefinition | undefined {\r\n+
this.ensureHas(type);\r\n return this.definitions.get(type);\r\n
}\r\n```\r\n\r\nRun the integration test `node
scripts/jest_integration.js\r\nx-pack/plugins/task_manager/server/integration_tests/removed_types.test.ts`\r\nand
see that it fails because a `WorkloadAggregator` error is
logged.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine
<[email protected]>","sha":"01eae1556266c8377f6557f4ccacc53e0b4db7fc"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Ying Mao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants