Feature improve shutdown behavior. Closes https://github.com/Supervisor/supervisor/issues/1101 #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use Case/ Context
Stopping a process via CLI or supervisor UI should stop the process and all it's child processes reliable. If the processes refuse to stop (e.g. not reacting to signals), they need to be forcefully stopped after some timeout. Additionally only direct child's of supervisor are properly stopped. Grandchild processes are not properly stopped and net be signaled as well.
Current Behavior
Supervisor handles different signals as stated here: http://supervisord.org/running.html#signals When a TERM, INT or QUIT signal is received, supervisor will trigger the shutdown behavior. This signals can also be sent to the process group, but not all processes started from a program are necessarily in the same process group, as mentioned here: Supervisor#199 (comment). Supervisor only knows about the direct child processes that it started but not any processes which any of the child processes spawned.
Possible Solution:
This stopping behavior still needs some improvements.
Stopping a process and all it's child processes reliable also needs a check if the processes are actually stopped and if the processes refuse to stop (e.g. not reacting to signals), they need to be forcefully stopped after some timeout via SIGKILL signal. There are a couple of things to consider:
disable_force_shutdown_behavior
which would disable the mechanism of sending SIGKILL to the process and its childs.We should define our usual behavior and allow users to alter this behavior. What's sensible is a default behavior -- I would say a round of SIGTERM to children, and after some time sending a SIGKILL to those who still alive is the ideal solution.
Note, however, that sometimes process can "hang" in the kernel (or as they call it D-state, see ps man page). It's rare in a healthy system, but still possible. For those situation, I would say we should report it to the user somehow.