Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature improve shutdown behavior. Closes https://github.com/Supervisor/supervisor/issues/1101 #1

Closed
wants to merge 5 commits into from

Conversation

michaelhammann
Copy link
Owner

Use Case/ Context
Stopping a process via CLI or supervisor UI should stop the process and all it's child processes reliable. If the processes refuse to stop (e.g. not reacting to signals), they need to be forcefully stopped after some timeout. Additionally only direct child's of supervisor are properly stopped. Grandchild processes are not properly stopped and net be signaled as well.

Current Behavior
Supervisor handles different signals as stated here: http://supervisord.org/running.html#signals When a TERM, INT or QUIT signal is received, supervisor will trigger the shutdown behavior. This signals can also be sent to the process group, but not all processes started from a program are necessarily in the same process group, as mentioned here: Supervisor#199 (comment). Supervisor only knows about the direct child processes that it started but not any processes which any of the child processes spawned.

Possible Solution:
This stopping behavior still needs some improvements.
Stopping a process and all it's child processes reliable also needs a check if the processes are actually stopped and if the processes refuse to stop (e.g. not reacting to signals), they need to be forcefully stopped after some timeout via SIGKILL signal. There are a couple of things to consider:

  • think about a good timeout. Systemd uses about 2-3 mins, for my use-cases 60 seconds should be sufficient.
  • SIGKILL can be very intrusive. For example when processes are killed which are doing some operations on a database, this can cause loss of data.
  • There should be an additional config parameter such as disable_force_shutdown_behavior which would disable the mechanism of sending SIGKILL to the process and its childs.

We should define our usual behavior and allow users to alter this behavior. What's sensible is a default behavior -- I would say a round of SIGTERM to children, and after some time sending a SIGKILL to those who still alive is the ideal solution.

Note, however, that sometimes process can "hang" in the kernel (or as they call it D-state, see ps man page). It's rare in a healthy system, but still possible. For those situation, I would say we should report it to the user somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant