Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grub: extend boot prompt timeout to 5 seconds #1263

Closed
wants to merge 1 commit into from

Conversation

dustymabe
Copy link
Member

At 1 second it's almost impossible to catch the boot prompt if you need
to change the kernel command line parameters. Let's extend it to 5
seconds so users have a fighting chance to catch the prompt.

This follows from a similar change made to the Live ISO:
coreos/fedora-coreos-config#281

At 1 second it's almost impossible to catch the boot prompt if you need
to change the kernel command line parameters. Let's extend it to 5
seconds so users have a fighting chance to catch the prompt.

This follows from a similar change made to the Live ISO:
coreos/fedora-coreos-config#281
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dustymabe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Member

Kind of torn... see e.g. this thread where @crawford was wary about another 5 seconds in node bootup.

We could make this platform specific? E.g. only do it on metal and vmware to start?

@cgwalters
Copy link
Member

(And then on AWS make it zero)

@cgwalters
Copy link
Member

I use e.g. cosa run and kola spawn a lot to get a quick shell, and our bootup is already really slow compared to other distributions; we do a lot in the initramfs and hook in some nontrivial things to multi-user.target including starting zincati/rpm-ostree. This would be a noticeable hit to boot time for that scenario too.

@dustymabe
Copy link
Member Author

Kind of torn... see e.g. this thread where @crawford was wary about another 5 seconds in node bootup.

yeah I agree I'm torn too. Extending boot time isn't great.

We could make this platform specific? E.g. only do it on metal and vmware to start?

I'd like to have it for all platforms where people can reasonably get to the console of the machine. This happens to be one of (and maybe the only) way to rollback a machine in the case of an upgrade failure that has your system borked (where the system being borked could have a lot of different meanings).

maybe I can make it platform specific, but it would be worth us saying which platforms users can reasonably get to a console on..

  • qemu
  • openstack
  • vmware
  • bare metal
  • which clouds??

@bgilbert
Copy link
Contributor

which clouds??

At least GCP, Azure, DO, and Packet. Not AWS.

@jlebon
Copy link
Member

jlebon commented Mar 19, 2020

Do all those clouds have a CLI for accessing the console though? I think Packet at least does, but otherwise if it's through the cloud's web UI, being able to access the console right at the start within a 5s window wouldn't be trivial. (My argument being let's not do this if it's actually not usable anyway.)

I'm guessing this is mostly about network args, right? (And I guess in the bare metal case, installer kargs, though we should be emphasizing the CLI path now: coreos/fedora-coreos-docs#26). In which case, can we limit this to just where that's relevant? Most clouds shouldn't really need network karg tweaks on first boot.

@dustymabe
Copy link
Member Author

dustymabe commented Mar 19, 2020

I'm guessing this is mostly about network args, right?

I don't think so. I think it's about any karg you'd want to add ephemerally because of need (debugging problems) or test (exploring features). Also, choosing an older bootentry in the case you need to.

@bgilbert
Copy link
Contributor

if it's through the cloud's web UI, being able to access the console right at the start within a 5s window wouldn't be trivial.

The consoles don't generally close when the instance restarts, so it's still possible to catch the GRUB prompt on reboot.

@cgwalters
Copy link
Member

How much would we want this if we had Ignition support for kernel arguments ?

@dustymabe
Copy link
Member Author

How much would we want this if we had Ignition support for kernel arguments ?

Ignition support for kargs would set them persistently, right? In #1263 (comment) I argue more that the value of this is for ephemeral arguments.

@cgwalters
Copy link
Member

Ignition support for kargs would set them persistently, right? In #1263 (comment) I argue more that the value of this is for ephemeral arguments.

Can you describe a more specific use case for ephemeral? I can imagine it but it helps to have it written down.

That said we could invent ephemeral arguments too though there's some interesting twists there.
(It might be most easily done via ostreedev/ostree#435 )

@dustymabe
Copy link
Member Author

Can you describe a more specific use case for ephemeral? I can imagine it but it helps to have it written down.

I'm sure we could name many more

@cgwalters
Copy link
Member

Right, though if this is for local development/testing, particularly with e.g. cosa run, it would be pretty easy to add cosa run --kargs="systemd.log_level=debug systemd.log_target=console" similarly to #1219

@cgwalters
Copy link
Member

In fact I'm just going to do that now

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Mar 19, 2020
For the use case of enabling systemd debugging, etc.
coreos#1263 (comment)
@cgwalters
Copy link
Member

Done in #1265

(with the usual requisite duplication between cosa and mantle...it's like some sort of strange recurring pattern)

@cgwalters
Copy link
Member

Now, using "easy to do w/qemu" tricks that don't work in other places is a tradeoff, because it does make debugging an issue elsewhere harder. But, IMO we should be making an OS that works absolutely perfectly in qemu - there's no excuse not to do so, particularly for early bootup stuff.

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Mar 19, 2020
For the use case of enabling systemd debugging, etc.
coreos#1263 (comment)
@cgwalters
Copy link
Member

The consoles don't generally close when the instance restarts, so it's still possible to catch the GRUB prompt on reboot.

If the use case includes debugging on reboot, then Ignition support for kargs would work right?

Another topic here - on bare metal at least...is there some sort of standard way one can ask the bootloader to display a menu via the UEFI firmware? Because it is a bit silly to have a timeout "press f12 to enter setup" or whatever, then another timeout "press a key to edit the bootloader".

@dustymabe
Copy link
Member Author

Right, though if this is for local development/testing,

I'm really not too concerned about that case with this proposed change. As developers we can hack and slash images to do whatever we want. I'm concerned about the user experience, which will make our lives easier too. If I'm helping someone debug and I need to ask them to add a kernel command line parameter, then explaining to them they only have 1 second to catch the prompt and whatever interface they're using might not even give them a chance at all to hit it in that time is bad for them and for us. As a user, I might consider moving to another offering.

@cgwalters
Copy link
Member

cgwalters commented Mar 20, 2020

As developers we can hack and slash images to do whatever we want.

Well yes, but I can say for sure I've often done cosa run + "catch grub prompt" for things because...it's easier than rebuilding a new image or hacking the image manually. And that would be unnecessary after #1265

If I'm helping someone debug and I need to ask them to add a kernel command line parameter, then explaining to them they only have 1 second to catch the prompt

Right. OK first, this github issue isn't the first time the bootloader timer has been discussed 😉
In fact https://fedoraproject.org/wiki/Changes/HiddenGrubMenu is highly relevant here.

Second, I get the use case but I'm arguing for platform specifics and more sophistication/thought.

In particular a bottom line for me is: We should set the bootloader timeout to zero in AWS, because anything else makes no sense at all.

@jlebon
Copy link
Member

jlebon commented Mar 20, 2020

One idea for ephemeral args is to have a file in /boot where you can write them down. On reboot, the GRUB config reads that file in and appends it to the list. We then delete the file during boot. Heck, we could even make this part of rpm-ostree kargs --once or something. (But note even now, although cumbersome, it should be totally fine to rpm-ostree kargs --append foobar and then delete it after you're done with it.)

For the "select an older boot entry", what's the scenario you're thinking of? If the boot still works enough to SSH in, then you can rpm-ostree rollback. If the boot is completely broken, then I think that's where we want coreos/fedora-coreos-tracker#47 (which yeah, we need to push forward on...).

@jlebon
Copy link
Member

jlebon commented Mar 20, 2020

Even simpler is a one-boot stamp file e.g. like /boot/grub-sleep which makes GRUB sleep a little longer.

@dustymabe
Copy link
Member Author

dustymabe commented Mar 20, 2020

As developers we can hack and slash images to do whatever we want.

Well yes, but I can say for sure I've often done cosa run + "catch grub prompt" for things, and that would be unnecessary after #1265

I support #1265 - thanks for that.

In particular a bottom line for me is: We should set the bootloader timeout to zero in AWS, because anything else makes no sense at all.

I'm with you. I support having a 0 timeout on platforms where it's feasibly impossible to catch the grub prompt. If we can agree on the platforms where it's impossible to get to a grub prompt (no console access) then I can try to rework this to take that into account and have 0 for those platforms and 5 for the ones where you can access it.

@dustymabe
Copy link
Member Author

One idea for ephemeral args is to have a file in /boot where you can write them down. On reboot, the GRUB config reads that file in and appends it to the list. We then delete the file during boot. Heck, we could even make this part of rpm-ostree kargs --once or something. (But note even now, although cumbersome, it should be totally fine to rpm-ostree kargs --append foobar and then delete it after you're done with it.)

I appreciate the ideas here, but I don't think there is anything we're going to do to cover all cases where someone is going to need to access the grub prompt/kernel command line. @jlebon you even wrote this: https://docs.fedoraproject.org/en-US/fedora-coreos/access-recovery/

It feels like we are trying to over-engineer this.

For the "select an older boot entry", what's the scenario you're thinking of? If the boot still works enough to SSH in, then you can rpm-ostree rollback. If the boot is completely broken, then I think that's where we want coreos/fedora-coreos-tracker#47 (which yeah, we need to push forward on...).

if the boot still works enough to SSH in - yeah that's great if it does, but who says that it will in every case?

Even if we had automatic rollback working I'd still want to be able to get the grub prompt just in case.

@jlebon
Copy link
Member

jlebon commented Mar 20, 2020

I appreciate the ideas here, but I don't think there is anything we're going to do to cover all cases where someone is going to need to access the grub prompt/kernel command line.

I think what bothers me is that (1) we're slowing down every boot across almost all platforms, and (2) the boot menu is not a very nice interface. So covering the major use cases via a nicer UX is a double win. And a sleep stamp file is a catch all for anything else.

Anyway, would it be unreasonable to start with just bare metal and VMware as suggested higher up (since that's where you expect bootup to take longer anyway)?

@dustymabe
Copy link
Member Author

Yeah I'm cool with only implementing it on a subset of platforms and then we can extend that mechanism based on further conversations we have.

So i'll start to try to make this generic so one can specify the timeout for platforms.

@dustymabe
Copy link
Member Author

/hold

@jlebon
Copy link
Member

jlebon commented Aug 23, 2021

This should be more tractable once we have coreos/fedora-coreos-tracker#110.

@jlebon
Copy link
Member

jlebon commented Jun 20, 2022

@dustymabe I'd suggest transforming this into a tracker issue RFE and closing this since an implementation of it would likely extend the work we did for platform-specific console instead.

@dustymabe
Copy link
Member Author

@dustymabe I'd suggest transforming this into a tracker issue RFE and closing this since an implementation of it would likely extend the work we did for platform-specific console instead.

Broke out into coreos/fedora-coreos-tracker#1236

@dustymabe dustymabe closed this Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants