-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container errors on restart #3352
Comments
The 3 issues preventing restart should be fixed with #3356 The bottom-line is that when containerd restarts, calling start on previously running containers will actually make them go through onCreateRuntime again. This is unexpected for me - as the normal stop/start flow does NOT do that - and very likely unexpected for other contributors as well. We should have a cold hard look at what is going on inside onCreateRuntime and make sure we account for the fact that it may be run multiple times for a single container without ever hitting onPostStop. Another concerning issue is (unfixed) #3357 - which may be a runc issue. I cannot think of a simple workaround for it, and it will bite us again next time we have failures in onCreateRuntime. |
#3356 and #3362 addressed a slew of issues and makes us more resistant to unexpected conditions. I am going to close this, as I am now able to restart without errors. Though, one issue remains that will still break the namestore ( #3357 ), and though problematic, it should not happen under normal conditions and possibly requires an upstream fix - should be addressed separately. |
Description
This is a variant of #3350
Containers cannot be restarted after being shutdown by containerd stopping, and are generally in a broken state.
This is against containerd v1.7 (unlike 3350 which was testing against ctd v2).
Steps to reproduce the issue
Reproduction is:
Then
or
Describe the results you received and expected
There are clearly multiple issues.
Fist is:
This issue affects only
main
(and not 1.7)I have a local patch for that that I will send shortly.
Second is:
This is definitely coming from https://github.com/containernetworking/plugins/blob/main/plugins/ipam/host-local/backend/allocator/allocator.go#L83
This has been there for some time and affects both 1.7 and main.
This needs discussion.
Should we modify the allocator over there and return the already allocated ip instead of failing?
Third is:
stop
cannot find the container Task, it does returncontainer not found
This is probably wide spread in our codebase and other commands may also fail for the same reason.
Issues 2 and 3 might be related.
I'll look into these and figure out if we can fix or workaround, then test with different network types, reboots and also containerd v2.
cc @AkihiroSuda we should flag this urgent - although this is apparently not new, this is a pretty bad set of issues.
What version of nerdctl are you using?
main
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information
The text was updated successfully, but these errors were encountered: