Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task Manager] Don't reset number of skipped runs as long as there is a validation error #172327

Merged
merged 2 commits into from
Dec 11, 2023

Conversation

ersin-erdal
Copy link
Contributor

resolves: #172325

@ersin-erdal ersin-erdal added release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.12.0 labels Nov 30, 2023
@ersin-erdal ersin-erdal self-assigned this Nov 30, 2023
@ersin-erdal ersin-erdal marked this pull request as ready for review November 30, 2023 21:32
@ersin-erdal ersin-erdal requested a review from a team as a code owner November 30, 2023 21:32
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ersin-erdal ersin-erdal changed the title [Tak Manager] Dont reset number of skipped runs as long as there is a validation error [Task Manager] Dont reset number of skipped runs as long as there is a validation error Nov 30, 2023
@ersin-erdal ersin-erdal changed the title [Task Manager] Dont reset number of skipped runs as long as there is a validation error [Task Manager] Don't reset number of skipped runs as long as there is a validation error Nov 30, 2023
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

✅ unchanged

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ersin-erdal

@ymao1
Copy link
Contributor

ymao1 commented Dec 1, 2023

I think we should discuss whether this is the right approach or whether we should be ensuring that the task is validating using the "indirect params" when running, meaning if task manager validates the "indirect params" 100 times and skips due to validation error, on the 101st run, task manager should run the task and the task itself should be throwing an error (resulting in a failed task run) based solely on the fact that the indirect params didn't validate (whether or not the rest of the task would have succeeded).

cc @mikecote

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified this works as described. For a recurring task, it skips it the MAX_SKIP_ATTEMPTS due to validation error, then runs it. If the run is successful, it does not reset the skip attempts so the next run (at the regularly scheduled interval), the task will not get unnecessarily skipped again.

@ymao1 ymao1 added v8.13.0 and removed v8.12.0 labels Dec 7, 2023
@ersin-erdal ersin-erdal merged commit cb41301 into elastic:main Dec 11, 2023
33 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Dec 11, 2023
@ersin-erdal ersin-erdal deleted the 172325-do-not-reset-num-skips branch December 11, 2023 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] TaskManager resets number of skipped runs after a successful run when there is still a validation error.
5 participants