-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent scheduling when tasks run within the poll interval of their original time #190093
Consistent scheduling when tasks run within the poll interval of their original time #190093
Conversation
…nsistent-scheduling-gap-threshold
/ci |
…nsistent-scheduling-gap-threshold
/ci |
@elasticmachine merge upstream |
/ci |
…thub.com:mikecote/kibana into task-manager/consistent-scheduling-gap-threshold
@elasticmachine merge upstream |
/ci |
Flaky Test Runner Stats🟠 Some tests failed. - kibana-flaky-test-suite-runner#6917[❌] x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group3/config.ts: 0/100 tests passed. |
…nsistent-scheduling-gap-threshold
…thub.com:mikecote/kibana into task-manager/consistent-scheduling-gap-threshold
/ci |
/ci |
Pinging @elastic/response-ops (Team:ResponseOps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some nits
} | ||
|
||
const taskSchedule = newSchedule?.interval ?? schedule?.interval; | ||
const taskDelay = Date.now() - originalRunAt.getTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this use startedAt
instead of Date.now()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, otherwise it would only work if the task finished running within the poll interval 🙈 fixed in 3d5eb39.
return newRunAt; | ||
} | ||
|
||
const taskSchedule = newSchedule?.interval ?? schedule?.interval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. since this is also calculated in the task runner, we could set const scheduleToUse = newSchedule ?? schedule
in the task runner and pass it into the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -162,6 +165,7 @@ export class TaskManagerRunner implements TaskRunner { | |||
private eventLoopDelayConfig: EventLoopDelayConfig; | |||
private readonly taskValidator: TaskValidator; | |||
private readonly claimStrategy: string; | |||
private currentPollInterval?: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we set a default in case something goes wrong with the observable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I defaulted it to the config.poll_interval
in this commit: d30bbfc
@elasticmachine merge upstream |
@elasticmachine merge upstream |
…thub.com:mikecote/kibana into task-manager/consistent-scheduling-gap-threshold
/ci |
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: |
…r original time (elastic#190093) Resolves elastic#189114 In this PR, I'm changing the logic to calculate the task's next run at. Whenever the gap between the task's runAt and when it was picked up is less than the poll interval, we'll use the `runAt` to schedule the next. This way we don't continuously add time to the task's next run (ex: running every 1m turns into every 1m 3s). I've had to modify a few tests to have a more increased interval because this made tasks run more frequently (on time), which introduced flakiness. ## To verify 1. Create an alerting rule that runs every 10s 2. Apply the following diff to your code ``` diff --git a/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts b/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts index 55d5f85e5d3..4342dcdd845 100644 --- a/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts +++ b/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts @@ -31,5 +31,7 @@ export function getNextRunAt( Date.now() ); + console.log(`*** Next run at: ${new Date(nextCalculatedRunAt).toISOString()}, interval=${newSchedule?.interval ?? schedule.interval}, originalRunAt=${originalRunAt.toISOString()}, startedAt=${startedAt.toISOString()}`); + return new Date(nextCalculatedRunAt); } ``` 3. Observe the logs, the gap between runAt and startedAt should be less than the poll interval, so the next run at is based on `runAt` instead of `startedAt`. 4. Stop Kibana for 15 seconds then start it again 5. Observe the first logs when the rule runs again and notice now that the gap between runAt and startedAt is larger than the poll interval, the next run at is based on `startedAt` instead of `runAt` to spread the tasks out evenly. --------- Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit 1f673dc)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…f their original time (#190093) (#193022) # Backport This will backport the following commits from `main` to `8.x`: - [Consistent scheduling when tasks run within the poll interval of their original time (#190093)](#190093) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Mike Côté","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-09-16T14:10:36Z","message":"Consistent scheduling when tasks run within the poll interval of their original time (#190093)\n\nResolves https://github.com/elastic/kibana/issues/189114\r\n\r\nIn this PR, I'm changing the logic to calculate the task's next run at.\r\nWhenever the gap between the task's runAt and when it was picked up is\r\nless than the poll interval, we'll use the `runAt` to schedule the next.\r\nThis way we don't continuously add time to the task's next run (ex:\r\nrunning every 1m turns into every 1m 3s).\r\n\r\nI've had to modify a few tests to have a more increased interval because\r\nthis made tasks run more frequently (on time), which introduced\r\nflakiness.\r\n\r\n## To verify\r\n1. Create an alerting rule that runs every 10s\r\n2. Apply the following diff to your code\r\n```\r\ndiff --git a/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts b/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\nindex 55d5f85e5d3..4342dcdd845 100644\r\n--- a/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\n+++ b/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\n@@ -31,5 +31,7 @@ export function getNextRunAt(\r\n Date.now()\r\n );\r\n\r\n+ console.log(`*** Next run at: ${new Date(nextCalculatedRunAt).toISOString()}, interval=${newSchedule?.interval ?? schedule.interval}, originalRunAt=${originalRunAt.toISOString()}, startedAt=${startedAt.toISOString()}`);\r\n+\r\n return new Date(nextCalculatedRunAt);\r\n }\r\n```\r\n3. Observe the logs, the gap between runAt and startedAt should be less\r\nthan the poll interval, so the next run at is based on `runAt` instead\r\nof `startedAt`.\r\n4. Stop Kibana for 15 seconds then start it again\r\n5. Observe the first logs when the rule runs again and notice now that\r\nthe gap between runAt and startedAt is larger than the poll interval,\r\nthe next run at is based on `startedAt` instead of `runAt` to spread the\r\ntasks out evenly.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"1f673dc9f12e90a6aa41a903fee8b0adafcdcaf9","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task Manager","Team:ResponseOps","v9.0.0","backport:prev-minor","v8.16.0"],"title":"Consistent scheduling when tasks run within the poll interval of their original time","number":190093,"url":"https://github.com/elastic/kibana/pull/190093","mergeCommit":{"message":"Consistent scheduling when tasks run within the poll interval of their original time (#190093)\n\nResolves https://github.com/elastic/kibana/issues/189114\r\n\r\nIn this PR, I'm changing the logic to calculate the task's next run at.\r\nWhenever the gap between the task's runAt and when it was picked up is\r\nless than the poll interval, we'll use the `runAt` to schedule the next.\r\nThis way we don't continuously add time to the task's next run (ex:\r\nrunning every 1m turns into every 1m 3s).\r\n\r\nI've had to modify a few tests to have a more increased interval because\r\nthis made tasks run more frequently (on time), which introduced\r\nflakiness.\r\n\r\n## To verify\r\n1. Create an alerting rule that runs every 10s\r\n2. Apply the following diff to your code\r\n```\r\ndiff --git a/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts b/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\nindex 55d5f85e5d3..4342dcdd845 100644\r\n--- a/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\n+++ b/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\n@@ -31,5 +31,7 @@ export function getNextRunAt(\r\n Date.now()\r\n );\r\n\r\n+ console.log(`*** Next run at: ${new Date(nextCalculatedRunAt).toISOString()}, interval=${newSchedule?.interval ?? schedule.interval}, originalRunAt=${originalRunAt.toISOString()}, startedAt=${startedAt.toISOString()}`);\r\n+\r\n return new Date(nextCalculatedRunAt);\r\n }\r\n```\r\n3. Observe the logs, the gap between runAt and startedAt should be less\r\nthan the poll interval, so the next run at is based on `runAt` instead\r\nof `startedAt`.\r\n4. Stop Kibana for 15 seconds then start it again\r\n5. Observe the first logs when the rule runs again and notice now that\r\nthe gap between runAt and startedAt is larger than the poll interval,\r\nthe next run at is based on `startedAt` instead of `runAt` to spread the\r\ntasks out evenly.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"1f673dc9f12e90a6aa41a903fee8b0adafcdcaf9"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/190093","number":190093,"mergeCommit":{"message":"Consistent scheduling when tasks run within the poll interval of their original time (#190093)\n\nResolves https://github.com/elastic/kibana/issues/189114\r\n\r\nIn this PR, I'm changing the logic to calculate the task's next run at.\r\nWhenever the gap between the task's runAt and when it was picked up is\r\nless than the poll interval, we'll use the `runAt` to schedule the next.\r\nThis way we don't continuously add time to the task's next run (ex:\r\nrunning every 1m turns into every 1m 3s).\r\n\r\nI've had to modify a few tests to have a more increased interval because\r\nthis made tasks run more frequently (on time), which introduced\r\nflakiness.\r\n\r\n## To verify\r\n1. Create an alerting rule that runs every 10s\r\n2. Apply the following diff to your code\r\n```\r\ndiff --git a/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts b/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\nindex 55d5f85e5d3..4342dcdd845 100644\r\n--- a/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\n+++ b/x-pack/plugins/task_manager/server/lib/get_next_run_at.ts\r\n@@ -31,5 +31,7 @@ export function getNextRunAt(\r\n Date.now()\r\n );\r\n\r\n+ console.log(`*** Next run at: ${new Date(nextCalculatedRunAt).toISOString()}, interval=${newSchedule?.interval ?? schedule.interval}, originalRunAt=${originalRunAt.toISOString()}, startedAt=${startedAt.toISOString()}`);\r\n+\r\n return new Date(nextCalculatedRunAt);\r\n }\r\n```\r\n3. Observe the logs, the gap between runAt and startedAt should be less\r\nthan the poll interval, so the next run at is based on `runAt` instead\r\nof `startedAt`.\r\n4. Stop Kibana for 15 seconds then start it again\r\n5. Observe the first logs when the rule runs again and notice now that\r\nthe gap between runAt and startedAt is larger than the poll interval,\r\nthe next run at is based on `startedAt` instead of `runAt` to spread the\r\ntasks out evenly.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"1f673dc9f12e90a6aa41a903fee8b0adafcdcaf9"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Mike Côté <[email protected]>
…terval of their original time (elastic#190093) (elastic#193022)" This reverts commit 64e5384.
… poll interval of their original time (elastic#190093) (elastic#193022)"" This reverts commit 6296a6e.
In this PR, I'm fixing a memory leak that was introduced in #190093 where every task runner class object wouldn't free up in memory because it subscribed to the `pollIntervalConfiguration$` observable. To fix this, I moved the observable up a class into `TaskPollingLifecycle` which only gets created once on plugin start and then pass down the pollInterval value via a function call the task runner class can call.
In this PR, I'm fixing a memory leak that was introduced in elastic#190093 where every task runner class object wouldn't free up in memory because it subscribed to the `pollIntervalConfiguration$` observable. To fix this, I moved the observable up a class into `TaskPollingLifecycle` which only gets created once on plugin start and then pass down the pollInterval value via a function call the task runner class can call. (cherry picked from commit cf6e8b5)
# Backport This will backport the following commits from `main` to `8.x`: - [Fix memory leak in task manager task runner (#193612)](#193612) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Mike Côté","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-09-20T17:52:26Z","message":"Fix memory leak in task manager task runner (#193612)\n\nIn this PR, I'm fixing a memory leak that was introduced in\r\nhttps://github.com//pull/190093 where every task runner\r\nclass object wouldn't free up in memory because it subscribed to the\r\n`pollIntervalConfiguration# Backport This will backport the following commits from `main` to `8.x`: - [Fix memory leak in task manager task runner (#193612)](#193612) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT observable. To fix this, I moved the\r\nobservable up a class into `TaskPollingLifecycle` which only gets\r\ncreated once on plugin start and then pass down the pollInterval value\r\nvia a function call the task runner class can call.","sha":"cf6e8b5ba971fffe2a57e1a7c573e60cc2fbe280","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task Manager","Team:ResponseOps","v9.0.0","backport:prev-minor","v8.16.0"],"title":"Fix memory leak in task manager task runner","number":193612,"url":"https://github.com/elastic/kibana/pull/193612","mergeCommit":{"message":"Fix memory leak in task manager task runner (#193612)\n\nIn this PR, I'm fixing a memory leak that was introduced in\r\nhttps://github.com//pull/190093 where every task runner\r\nclass object wouldn't free up in memory because it subscribed to the\r\n`pollIntervalConfiguration# Backport This will backport the following commits from `main` to `8.x`: - [Fix memory leak in task manager task runner (#193612)](#193612) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT observable. To fix this, I moved the\r\nobservable up a class into `TaskPollingLifecycle` which only gets\r\ncreated once on plugin start and then pass down the pollInterval value\r\nvia a function call the task runner class can call.","sha":"cf6e8b5ba971fffe2a57e1a7c573e60cc2fbe280"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/193612","number":193612,"mergeCommit":{"message":"Fix memory leak in task manager task runner (#193612)\n\nIn this PR, I'm fixing a memory leak that was introduced in\r\nhttps://github.com//pull/190093 where every task runner\r\nclass object wouldn't free up in memory because it subscribed to the\r\n`pollIntervalConfiguration# Backport This will backport the following commits from `main` to `8.x`: - [Fix memory leak in task manager task runner (#193612)](#193612) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT observable. To fix this, I moved the\r\nobservable up a class into `TaskPollingLifecycle` which only gets\r\ncreated once on plugin start and then pass down the pollInterval value\r\nvia a function call the task runner class can call.","sha":"cf6e8b5ba971fffe2a57e1a7c573e60cc2fbe280"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Mike Côté <[email protected]>
Resolves #189114
In this PR, I'm changing the logic to calculate the task's next run at. Whenever the gap between the task's runAt and when it was picked up is less than the poll interval, we'll use the
runAt
to schedule the next. This way we don't continuously add time to the task's next run (ex: running every 1m turns into every 1m 3s).I've had to modify a few tests to have a more increased interval because this made tasks run more frequently (on time), which introduced flakiness.
To verify
runAt
instead ofstartedAt
.startedAt
instead ofrunAt
to spread the tasks out evenly.