Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve VM shutdown (from Incus) #13875

Merged
merged 3 commits into from
Sep 7, 2024

Conversation

hamistao
Copy link
Contributor

@hamistao hamistao commented Aug 5, 2024

Includes cherry picks from lxc/incus#362

@hamistao hamistao force-pushed the improve_reboot_experience branch 3 times, most recently from 26cac11 to ec85156 Compare August 5, 2024 13:10
Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

Could we have a test added in lxd-ci VM tests that checks for the correct exit codes, I believe we have something in the main lxd test suite for this already for containers (if not we should add that too) and would be good to have something or VMs as well so we can catch regressions. Ta

@tomponline
Copy link
Member

@hamistao ready for review?

@hamistao hamistao force-pushed the improve_reboot_experience branch 3 times, most recently from dad5939 to 2a6cf03 Compare August 6, 2024 12:34
@hamistao
Copy link
Contributor Author

hamistao commented Aug 6, 2024

@tomponline This is still missing working tests for 129 exit code in containers, as there weren't any before. I am still busy with the API metrics but I will return to this as soon as I finish the changes on the metrics PR.

@hamistao hamistao force-pushed the improve_reboot_experience branch from 2a6cf03 to be41b44 Compare September 3, 2024 14:20
@@ -289,7 +290,14 @@ func (s *execWs) Do(op *operations.Operation) error {
_ = pty.Close()
}

// Make VM disconnections (shutdown/reboot) match containers.
if cmdErr == drivers.ErrExecDisconnected {
cmdResult = 129
Copy link
Member

@mihalicyn mihalicyn Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit code is 128+SIGHUP (=1) = 129. Looks good to me.

see also, https://tldp.org/LDP/abs/html/exitcodes.html

mihalicyn
mihalicyn previously approved these changes Sep 5, 2024
@tomponline
Copy link
Member

@hamistao ready for a review, is there an accompanying test for this in lxd-ci?

@hamistao
Copy link
Contributor Author

hamistao commented Sep 5, 2024

@mihalicyn Thanks for the review, but my main question was regarding the failing tests. Although the VM case is currently working as expected, container disconnections result in a 137 exit code instead of 129 on my local tests. I wanted to confirm if this behavior can be considered a bug and if just catching this exit code and changing it to 129 would be an adequate solution or if there could be a deeper problem involved.

Also, the GH tests behave differentely from my local tests and return a 143 exit code, @simondeziel discovered that this exit code is also what we get when killing the agent process inside the instance to make it disconnect.

@hamistao
Copy link
Contributor Author

hamistao commented Sep 6, 2024

@tomponline This is ready for a review, the PR containing the tests was also opened here. When both are merged I will also add tests for containers on the LXD repo.

stgraber and others added 3 commits September 6, 2024 18:05
Signed-off-by: Stéphane Graber <[email protected]>
Signed-off-by: hamistao <[email protected]>
License: Apache-2.0
Closes lxc/incus#256

Signed-off-by: Stéphane Graber <[email protected]>
Signed-off-by: hamistao <[email protected]>
License: Apache-2.0
When the VM disconnects due to a stop/reboot, the socket is closed and the
error we get is a `connection reset by peer` or a `Unexpected EOF` instead
of a EOF.
So this helps handle these errors similarly to make the command exit
cleanly in these scenarios.
Since both errors are of type errorString, the only way to check it
is to see if they contains the expected substring.

Signed-off-by: hamistao <[email protected]>
@hamistao hamistao force-pushed the improve_reboot_experience branch from d3874bc to 0f5be4b Compare September 6, 2024 21:05
@tomponline tomponline merged commit dbefa5e into canonical:main Sep 7, 2024
29 checks passed
@tomponline tomponline changed the title Improve VM shutdown Improve VM shutdown (from Incus) Sep 7, 2024
tomponline added a commit to canonical/lxd-ci that referenced this pull request Sep 18, 2024
As was discussed with @mihalicyn If containers are stopped gracefully,
the exit code can be 129(SIGHUP) or 143(SIGTERM) depending on the signal
used, both results have been seen and make sense. If forcefully stopping
a container the exit code is always 137 (SIGKILL). For VMs, the exit
code was standardized as 129 for all cases by
[#13875](canonical/lxd#13875) (not merged yet).

In its current state, this PR only adds tests for forcefully stopping
containers and VMs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants