-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve VM shutdown (from Incus) #13875
Conversation
26cac11
to
ec85156
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
Could we have a test added in lxd-ci VM tests that checks for the correct exit codes, I believe we have something in the main lxd test suite for this already for containers (if not we should add that too) and would be good to have something or VMs as well so we can catch regressions. Ta
@hamistao ready for review? |
dad5939
to
2a6cf03
Compare
@tomponline This is still missing working tests for 129 exit code in containers, as there weren't any before. I am still busy with the API metrics but I will return to this as soon as I finish the changes on the metrics PR. |
2a6cf03
to
be41b44
Compare
@@ -289,7 +290,14 @@ func (s *execWs) Do(op *operations.Operation) error { | |||
_ = pty.Close() | |||
} | |||
|
|||
// Make VM disconnections (shutdown/reboot) match containers. | |||
if cmdErr == drivers.ErrExecDisconnected { | |||
cmdResult = 129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit code is 128+SIGHUP (=1)
= 129
. Looks good to me.
@hamistao ready for a review, is there an accompanying test for this in lxd-ci? |
@mihalicyn Thanks for the review, but my main question was regarding the failing tests. Although the VM case is currently working as expected, container disconnections result in a 137 exit code instead of 129 on my local tests. I wanted to confirm if this behavior can be considered a bug and if just catching this exit code and changing it to 129 would be an adequate solution or if there could be a deeper problem involved. Also, the GH tests behave differentely from my local tests and return a 143 exit code, @simondeziel discovered that this exit code is also what we get when killing the agent process inside the instance to make it disconnect. |
be41b44
to
8e7b0ae
Compare
8e7b0ae
to
d3874bc
Compare
@tomponline This is ready for a review, the PR containing the tests was also opened here. When both are merged I will also add tests for containers on the LXD repo. |
Signed-off-by: Stéphane Graber <[email protected]> Signed-off-by: hamistao <[email protected]> License: Apache-2.0
Closes lxc/incus#256 Signed-off-by: Stéphane Graber <[email protected]> Signed-off-by: hamistao <[email protected]> License: Apache-2.0
When the VM disconnects due to a stop/reboot, the socket is closed and the error we get is a `connection reset by peer` or a `Unexpected EOF` instead of a EOF. So this helps handle these errors similarly to make the command exit cleanly in these scenarios. Since both errors are of type errorString, the only way to check it is to see if they contains the expected substring. Signed-off-by: hamistao <[email protected]>
d3874bc
to
0f5be4b
Compare
As was discussed with @mihalicyn If containers are stopped gracefully, the exit code can be 129(SIGHUP) or 143(SIGTERM) depending on the signal used, both results have been seen and make sense. If forcefully stopping a container the exit code is always 137 (SIGKILL). For VMs, the exit code was standardized as 129 for all cases by [#13875](canonical/lxd#13875) (not merged yet). In its current state, this PR only adds tests for forcefully stopping containers and VMs.
Includes cherry picks from lxc/incus#362