-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider disabling emergency shell timeout and reboot if an error is hit in the initramfs on first boot #928
Comments
In general, Ignition is expected to be idempotent. The transposefs glue is very much not, though, and Ignition has gaps around file appending (coreos/ignition#642) and apparently also LUKS keyfiles. Also, on the RHCOS side I have been told that at least one customer depends on the reboot semantics. |
The automatic reboot was originally implemented in CL to give GRUB a chance to fall back to the known good OS release after an update failure. We don't have such code in our GRUB, and retrying provisioning at the OS level does seem likely to exercise under-tested code paths. That still leaves the issue of users that have taken a dependency on the current behavior. One intermediate option is to lock out the automatic reboot if transposefs has been engaged. |
Right yeah, the initramfs is much more distro glue than it is Ignition at this point. And none of it really accounts for half-provisioned systems, and new code going forward likely won't either. So I don't think we should scope this to just transposefs. If there are people who depend on the current behaviour, we should find out why and fix the underlying issue (e.g. by continuing the recent trend of just retrying operations forever on transient errors). Re. automatic rollback, note I'm suggesting we do this only for the first boot, because it's special to us. So there'd be nothing to roll back to anyway. |
This is related to coreos/ignition-dracut#137 |
This was discussed in today's community meeting:
|
I've talked to the person I originally heard this from, and was unable to track down the reference. So I don't have anything concrete to offer here. At this point I think we should drop the automatic reboot on all boots, not just the first boot. The current behavior hides intermittent boot bugs, and the main reason to keep it is to avoid uncovering them. Let's just take the leap and fix the bugs. |
The reboot and consequently the timeout masked valuable debug information. The reboot also caused some cascading errors due to the fact that the system would try and run as if all required dependencies were satisfied during the first boot. The issue can be found at coreos/fedora-coreos-tracker#928
The reboot and consequently the timeout masked valuable debug information. The reboot also caused some cascading errors due to the fact that the system would try and run as if all required dependencies were satisfied during the first boot. Closes coreos/fedora-coreos-tracker#928.
@dustymabe We missed labeling this one for releases. Will be looking at which one it went into. |
The reboot and consequently the timeout masked valuable debug information. The reboot also caused some cascading errors due to the fact that the system would try and run as if all required dependencies were satisfied during the first boot. Closes coreos/fedora-coreos-tracker#928.
The reboot and consequently the timeout masked valuable debug information. The reboot also caused some cascading errors due to the fact that the system would try and run as if all required dependencies were satisfied during the first boot. Closes coreos/fedora-coreos-tracker#928.
Right now if there's an error in the initramfs, we get:
But rebooting may hide important error messages and then the next boot may fail in a different way due to firstboot assumptions being violated.
We should just disable that timeout and maybe even automatically enter the emergency shell.
The text was updated successfully, but these errors were encountered: