-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a service to call signpost when we determine that the boot was successful #481
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, I have a hard time understanding the need for three unit files, and understanding the connection between them; I think it'd be good to put a brief explanation in the files themselves.
Jamie and I had some further out-of-band discussion based on my questions above. One point that helps clarify is that the |
Jamie and I had some further further out-of-band discussion. We landed on the idea of making kubelet (or other services that we deem required for boot) retry a set number of times (per failure incident) rather than forever. That would allow us to have services (namely mark-successful-boot) depend on them normally, and the system status would be degraded if they fail, rather than requiring out-of-band methods like timers and |
676ac97
to
a4b4ed4
Compare
Removed measure-successful-boot.timer and measure-successful.service. This reduces the scope of this change to calling signpost if the boot succeeded. Rolling back and observing or reporting on the health of the host will be implemented in future changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see the description updated to reflect the new smaller scope, and the little cleanup with the Wants
line, but no blockers from me!
a4b4ed4
to
f25198b
Compare
Removed the |
This implements a method to measure whether or not the host booted successfully, and then call signpost when that is achieved. It consists of a new unit in the updater:
In addition to this new service, changes are required to services that we want to fail the boot if they can't start:
Type=notify
to ensure that it does not transition to Active if it fails and systemd restarts it. This could require patches for services that don't already support sd_notify.RequiredBy=mark-successful-boot.service
in its [Install] section.This change includes on-boarding kubelet.service to this way of boot measurement.
Fixes Issue #86
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.