Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task Manager: Cancelling a workflow should kill all associated slurm jobs #689

Open
rstyd opened this issue May 9, 2023 · 3 comments
Open
Labels
discuss We need to discuss this issue. enhancement New feature or request

Comments

@rstyd
Copy link
Collaborator

rstyd commented May 9, 2023

Our initial plan was that when cancelling a workflow all tasks should just keep running on the task manager side. This requires a lot less work on our part, but it can cause some issues for users. For example, a long running task might eat up time they have available on a partition. Or, a workflow with many small independent tasks could make it annoying having to stop each individual one. Additionally, a job can fail when the user isn't actually aware of it (e.g. one of the tasks fail), it could take the user quite a while before they realize and manually cancel the slurm jobs.

I propose we modify the workflow manager to add a cancel workflow endpoint which will remove all jobs currently in the submit queue for a specified workflow, and call the slurm worker cancel_task function on each task currently in the submit queue that belongs to the cancelled workflow.

@rstyd rstyd added the enhancement New feature or request label May 9, 2023
@pagrubel
Copy link
Collaborator

@rstyd We should give an option for the user to let running jobs continue.

@pagrubel pagrubel added the discuss We need to discuss this issue. label Dec 13, 2023
@pagrubel
Copy link
Collaborator

We need a definite plan for cancelling workflows. Right now the jobs continue and the workflow.
Right now if a workflow is cancelled, jobs continue to run, but the states are at whatever point they were when cancelled. If we allow jobs to continue we need to have a Workflow "Cancelling State" until they are completed, and probably archive the workflow since some of it may have completed.

@pagrubel
Copy link
Collaborator

pagrubel commented Dec 2, 2024

#960 Fixed issues with Cancel workflow and archives the results (i.e. there is an archive of whatever ran)
We need an option that can be invoked to cancel all running jobs if the user chooses to do so. Something like
''beeflow cancel --all""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss We need to discuss this issue. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants