Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup job for running containers #409

Open
christophprokop opened this issue Aug 21, 2024 · 2 comments · May be fixed by #422
Open

cleanup job for running containers #409

christophprokop opened this issue Aug 21, 2024 · 2 comments · May be fixed by #422
Labels
prio Issues with a higher priority

Comments

@christophprokop
Copy link
Collaborator

atm if a longer job is running there is no possibility to stop the running containers
CTRL-C does not stop the currently running containers/jobs but just gives you back the command prompt but the jobs keep running
maybe introduce some kind of cleanup job?
regular use case here: you want to add a minor change and rerun

@christophprokop christophprokop added the prio Issues with a higher priority label Aug 21, 2024
@ammernico
Copy link
Collaborator

Took a look into this. Currently, when pressing CTRL+C the process gets terminated with a SIGINT signal by Linux.
But we can use the ctrlc crate or the signal feature from tokio to handle the signal ourself.

The nicest way to clean up containers would probably be implementing drop on the structure which holds the job with the container id.

I already tried this and ran into issues with how tokio currently handles panics. When receiving the CTRL+C signal and trying to unwinding the program with std::process::ExitCode or calling down panic, just the current thread panics and other tokio threads continue to run.

This is wanted by tokio but there are open issues which try to implement custom behaviur for how to handle panics in threads.

tokio-rs/tokio#2002
tokio-rs/tokio#4516

The feature can already be used in tokio unstable. But the unstable version brings many other changes/issues and I would recommend waiting for it to become stable.

https://docs.rs/tokio/1.40.0/tokio/runtime/struct.Builder.html#method.unhandled_panic

@ammernico ammernico added the blocked Blocked by an external issue label Sep 6, 2024
@primeos-work
Copy link
Member

Nice catch. The current code use a bit of a fire-and-forget approach where the containers get started but then run to completion with no way to stop them (IIRC) - butido only streams the logs, updates the current status, saves artifacts, etc.

My idea with @ammernico was to use the RAII pattern to ensure that all containers that a butido-build command starts do also get stopped.

Unfortunately it wasn't quite that easy as Ctrl+c aborts the program so we need to implement a graceful shutdown. Ideally we can shutdown the Tokio runtime but if that doesn't work we probably have to implement that part ourselves (i.e., we should at least check if Ctrl-c was pressed during the butido-build part where the containers are running).

Some "random" notes:

@primeos-work primeos-work linked a pull request Oct 7, 2024 that will close this issue
ammernico added a commit to ammernico/butido that referenced this issue Oct 14, 2024
- Add the `signal` feature to `tokio` to interrupt and handle the
  Control-C signal in Butido.
- Add Control-C signal handling into the `Orchestrator`.
- Implement `Drop` on the `JobHandle` to ensure container cleanup.

Fixes science-computing#409

Signed-off-by: Nico Steinle <[email protected]>
@ammernico ammernico removed the blocked Blocked by an external issue label Oct 14, 2024
ammernico added a commit that referenced this issue Oct 16, 2024
- Add the `signal` feature to `tokio` to interrupt and handle the
  Control-C signal in Butido.
- Add Control-C signal handling into the `Orchestrator`.
- Implement `Drop` on the `JobHandle` to ensure container cleanup.

Fixes #409

Signed-off-by: Nico Steinle <[email protected]>
ammernico added a commit that referenced this issue Nov 5, 2024
- Add the `signal` feature to `tokio` to interrupt and handle the
  Control-C signal in Butido.
- Add Control-C signal handling into the `Orchestrator`.
- Implement `Drop` on the `JobHandle` to ensure container cleanup.

Fixes #409

Signed-off-by: Nico Steinle <[email protected]>
ammernico added a commit to ammernico/butido that referenced this issue Nov 11, 2024
- Add the `signal` feature to `tokio` to interrupt and handle the
  Control-C signal in Butido.
- Add Control-C signal handling into the `Orchestrator`.
- Implement `Drop` on the `JobHandle` to ensure container cleanup.

Fixes science-computing#409

Signed-off-by: Nico Steinle <[email protected]>
ammernico added a commit to ammernico/butido that referenced this issue Nov 13, 2024
- Add the `signal` feature to `tokio` to interrupt and handle the
  Control-C signal in Butido.
- Add Control-C signal handling into the `Orchestrator`.
- Implement `Drop` on the `JobHandle` to ensure container cleanup.

Fixes science-computing#409

Signed-off-by: Nico Steinle <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio Issues with a higher priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants