Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running tasks get spuriously killed #133

Open
jcristau opened this issue Feb 14, 2023 · 1 comment
Open

running tasks get spuriously killed #133

jcristau opened this issue Feb 14, 2023 · 1 comment

Comments

@jcristau
Copy link
Contributor

From time to time we hit an issue where a scriptworker task gets killed: when it periodically scans each pool, k8s-autoscale counts pending tasks and running workers, and if it thinks there are too many running workers it tells k8s to stop them. scriptworker gets SIGUSR1, which tells it to stop after the current task. However if it's not done after terminationGracePeriodSeconds (currently 20 minutes, except for treescript where it's 1 hour), it gets SIGTERM and terminates the running task, which then has to be rerun for no good reason.

@gabrielBusta
Copy link
Member

gabrielBusta commented Feb 14, 2023

Maybe the workers could shutdown gracefully and tell auto-scale when they are done with the current task?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants