Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardening lifecycle-state-store, name-store, and oci-hooks #3362

Merged
merged 3 commits into from
Aug 29, 2024

Conversation

apostasie
Copy link
Contributor

@apostasie apostasie commented Aug 24, 2024

First commit

Although #3350 got closed as a fluke, the error message seen in the logs clearly shows that the lifecycle state file could end-up messed-up, triggering a failure.

This does two things:

  • write atomically, so that we never end-up with half-written / hosed JSON in the file
  • tolerate load errors: this file is not critical and purely informative and should soft fail with a warning instead

This will make the LifecycleState more robust and resistant to host shutdowns.

Note: we should consider a review of all our filesystem based stores and consider moving to atomic writing for them.

Second commit

Fix #3351 by making us resistant to the condition and allowing a container to acquire a "broken" name.

This obviously should never happen, but it does ^.

Third commit

Fix #3357 - does not fix the underlying issue, which is possibly upstream - but effectively mitigate it.
Since onPostStop is (wrongly?) called after an error in onCreateRuntime - while the container will NOT be deleted - we inhibit the normal cleanup flow in that case.

@apostasie apostasie marked this pull request as draft August 24, 2024 22:40
@apostasie apostasie force-pushed the hardening-lifecycle-state-3350 branch from 0d828c9 to cf93ff9 Compare August 24, 2024 23:43
@apostasie apostasie changed the title Hardening lifecycle state store Hardening lifecycle-state-store and name-store Aug 24, 2024
@apostasie apostasie force-pushed the hardening-lifecycle-state-3350 branch from 89804a3 to baf2e52 Compare August 24, 2024 23:53
@apostasie apostasie mentioned this pull request Aug 25, 2024
3 tasks
@apostasie apostasie marked this pull request as ready for review August 25, 2024 00:18
@apostasie apostasie changed the title Hardening lifecycle-state-store and name-store Hardening lifecycle-state-store, name-store, and oci-hooks Aug 25, 2024
@apostasie apostasie force-pushed the hardening-lifecycle-state-3350 branch from a51aca2 to f31c4ff Compare August 25, 2024 04:21
@apostasie
Copy link
Contributor Author

Failure is the usual IPFS/compose.

@apostasie apostasie force-pushed the hardening-lifecycle-state-3350 branch from f31c4ff to f48fac3 Compare August 27, 2024 16:42
@apostasie
Copy link
Contributor Author

The Kube test (and underlying condition) is clearly racy.
This test is getting annoying (especially since right now we are just "confirming that a certain bug exists").
We might want to disable it for now (or fix the issue...). cc @lingdie

@apostasie
Copy link
Contributor Author

Grumble grumble.
Interactive rebase...
Should be ready in a second.

@apostasie apostasie force-pushed the hardening-lifecycle-state-3350 branch from f48fac3 to 8e74cde Compare August 27, 2024 17:05
@apostasie apostasie force-pushed the hardening-lifecycle-state-3350 branch from 8e74cde to b954c7a Compare August 27, 2024 17:12
@apostasie
Copy link
Contributor Author

@AkihiroSuda CI is green. Mistake addressed.

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 1a55c72 into containerd:main Aug 29, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants