running tasks get spuriously killed #133

jcristau · 2023-02-14T16:43:07Z

From time to time we hit an issue where a scriptworker task gets killed: when it periodically scans each pool, k8s-autoscale counts pending tasks and running workers, and if it thinks there are too many running workers it tells k8s to stop them. scriptworker gets SIGUSR1, which tells it to stop after the current task. However if it's not done after terminationGracePeriodSeconds (currently 20 minutes, except for treescript where it's 1 hour), it gets SIGTERM and terminates the running task, which then has to be rerun for no good reason.

gabrielBusta · 2023-02-14T16:50:34Z

Maybe the workers could shutdown gracefully and tell auto-scale when they are done with the current task?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running tasks get spuriously killed #133

running tasks get spuriously killed #133

jcristau commented Feb 14, 2023

gabrielBusta commented Feb 14, 2023 •

edited

Loading

running tasks get spuriously killed #133

running tasks get spuriously killed #133

Comments

jcristau commented Feb 14, 2023

gabrielBusta commented Feb 14, 2023 • edited Loading

gabrielBusta commented Feb 14, 2023 •

edited

Loading