-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grub: core.img: Reboot if configfile fails #585
Conversation
packages/grub/core.cfg
Outdated
configfile /grub/grub.cfg | ||
echo "boot failed (device ($boot_dev), uuid $boot_uuid)" | ||
echo "trying again in 30 seconds..." | ||
sleep 30 | ||
reboot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grub manual does describe configfile
returning... are these new commands executed after a successful boot, i.e. after selecting a healthy entry, or at shutdown/reboot time?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation I see reads:
Load file as a configuration file. If file defines any menu entries, then show a menu containing them immediately. Any environment variable changes made by the commands in file will not be preserved after
configfile
returns.
The config file we load defines a menu entry with a default and a timeout of 0, so I believe that means it executes it and never returns. (In practice that appears to be the case; systemctl halt
does not reboot the host in QEMU.)
If GRUB attempts to load a config file from a boot partition that is empty or otherwise broken, it will sit forever at a rescue console, which is not particularly helpful in EC2. gptprio.next decrements tries_left on partitions not yet marked successful, so rebooting will usually result in the correct behavior of rolling back. Some diagnostic output is printed before sleeping for 30 seconds to help aid future debugging. No additional output is printed if the configfile boots successfully. Signed-off-by: iliana destroyer of worlds <[email protected]>
We haven't been using this module since af72caf. Signed-off-by: iliana destroyer of worlds <[email protected]>
(I brought up in the aisle that including a short URL that redirects to some documentation about why GRUB might have failed could be useful, once we know what this project is actually going to be called and we have a domain name.) |
If GRUB attempts to load a config file from a boot partition that is empty or otherwise broken, it will sit forever at a rescue console, which is not particularly helpful in EC2.
gptprio.next decrements tries_left on partitions not yet marked successful, so rebooting will usually result in the correct behavior of rolling back.
Some diagnostic output is printed before sleeping for 30 seconds to help aid future debugging. No additional output is printed if the configfile boots successfully.
This also drops search_fs_uuid from being built into core.img, as we haven't used that since af72caf.
x86_64's core.img size went from 129659 bytes to 130203 bytes with this change.
Tested with QEMU on x86_64 with shell enabled by booting Thar, running
signpost upgrade-to-inactive
without writing anything to the disk, and rebooting. GRUB printed the diagnostic message, slept 30 seconds, and rebooted, which booted into set A. I didn't test aarch64 yet.There's still an open question on if we want to upgrade (and downgrade?) the GRUB core.img on older AMIs by (ab)using migrations.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.