Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.

Enforce that Ignition run once even if interrupted #137

Closed

Conversation

cgwalters
Copy link
Member

I have seen in practice us trying to run Ignition multiple times
if the system is forcibly rebooted before we run the firstboot
complete service, which actually today runs in the real rootfs.

Forcibly prevent this by adding a few tiny new shell scripts
which write a state file to /boot/ignition.state. The first
to run is ignition-prelaunch.service which runs even before
fetch and bombs out directly if we detect the stamp file
exists.

Next, we have a service which writes the stamp file after fetch
and before disks. Finally, another service updates the stamp
file just to signal complete.

Note none of these services run on live systems, as there's
no /boot there, and Ignition runs every boot anyways.

@ajeddeloh
Copy link
Contributor

We are planning to move the firstboot file deletion to the initramfs, wouldn't that also solve this issue?

@cgwalters
Copy link
Member Author

We are planning to move the firstboot file deletion to the initramfs, wouldn't that also solve this issue?

The problem I'm trying to solve here is more about handling the case where we're rebooted during e.g. the disks or files stages. Potentially, if a different config is provided on the subsequent boot, we could silently get part of the first config, and the rest of the second!

This is trying to make it an explicit fatal error if we're interrupted - moving deletion of the firstboot file to the initramfs doesn't help that, right? Because if we unlink it before we make changes, then if we're interrupted after unlink but before Ignition completes, we still have a partially configured system - and this wouldn't be obvious to admins! If we unlink it after, we have the same problem this is solving.

@bgilbert
Copy link
Contributor

The problem I'm trying to solve here is more about handling the case where we're rebooted during e.g. the disks or files stages. Potentially, if a different config is provided on the subsequent boot, we could silently get part of the first config, and the rest of the second!

In general, Ignition runs should be idempotent. (For a case where we fail at that, see coreos/ignition#642). There are error cases [1] which fail to an emergency shell, eventually reboot, and then intentionally try again. This change breaks that high-level recovery process.

The situation you've described, where a different config is provided on a subsequent boot, we've generally considered to be user error. If we're concerned about that case, I'd rather see us cache the rendered config and reuse it on subsequent attempts.

[1] For example, a transient server outage while fetching a remote resource in the files stage.

@cgwalters
Copy link
Member Author

In general, Ignition runs should be idempotent

I agree that'd be nice; however, in the general case with anything "destructive" such as e.g. rootfs reprovisioning support (and your linked example of append) it's going to be hard to achieve.

We hit this with RHCOS and firstboot encryption - trying to make that process idempotent is possible, but not easy.

The situation you've described, where a different config is provided on a subsequent boot, we've generally considered to be user error. If we're concerned about that case, I'd rather see us cache the rendered config and reuse it on subsequent attempts.

Yeah, it has come up before to cache the config in /boot. That would help us with other things too.

@jlebon
Copy link
Member

jlebon commented Nov 25, 2019

There are error cases [1] which fail to an emergency shell, eventually reboot, and then intentionally try again. This change breaks that high-level recovery process.

[1] For example, a transient server outage while fetching a remote resource in the files stage.

Doesn't Ignition already retry on transient errors? I'm not a fan of "reboot and retry everything" as part of the strategy for handling transient errors halfway through (we should be retrying/handling those on the spot). I think maintaining the stance of "Ignition runs only once" is more important. (One exception we can make is after ignition-fetch.service but before anything else since we haven't actually started modifying anything yet. RHCOS FIPS mode relies on this.)

That said, I think we should still allow forcing Ignition to rerun via ignition.firstboot=1. This is super useful in the developer case, and I would guess it's also useful for sysadmins debugging things too. And I think this PR breaks that, right?

One thing we could do here is only error out if both /boot/ignition.state and /boot/ignition.firstboot are present.

@jlebon
Copy link
Member

jlebon commented Nov 25, 2019

Something I was thinking about was whether we could just use the Ignition config itself as the stamp file instead of a separate ignition.state file (implementing the caching RFE at the same time). But we still need to be able to note when we cross that fetch -> disk red line.

@cgwalters
Copy link
Member Author

But we still need to be able to note when we cross that fetch -> disk red line.

Indeed, the implementation in this PR only writes the "starting ignition" stamp file after fetch has completed successfully.

@bgilbert
Copy link
Contributor

In general, Ignition runs should be idempotent

I agree that'd be nice; however, in the general case with anything "destructive" such as e.g. rootfs reprovisioning support (and your linked example of append) it's going to be hard to achieve.

Idempotency is implied by declarative provisioning, which is a design principle of Ignition. Technically, rootfs reprovisioning and firstboot encryption are outside the scope of Ignition proper, so those observations aren't incompatible. 🙂 In really destructive cases where we can't rerun, it might make sense to add an interlock as you do here. But I don't think we should do that in general. For one thing, it invalidates much of the point of the automatic 5-minute reboot timer.

Doesn't Ignition already retry on transient errors?

Some number of times, but not indefinitely.

I think maintaining the stance of "Ignition runs only once" is more important.

Historically that stance has meant that users can't rerun Ignition to produce a second set of user-visible effects. It has not meant that we don't rerun Ignition with the same config after a failure produces a partially-applied config, because idempotency implies that doing so should be completely safe. You're right that we haven't been clear about the distinction.

@ajeddeloh, any thoughts?

@cgwalters
Copy link
Member Author

cgwalters commented Dec 1, 2019

I think a lot of the point of what we're doing with our technologies is to move more closely towards "image based"/"declarative" systems - commonly having a single "source of truth" for the state of components. OSTree gives you a cryptographic checksum for the OS content, for which it is the source of truth. Similarly Ignition also has a checksum for its final fetched config and the intention is Ignition defines your whole /etc in an ideal case.

There are various benefits to moving to models like this - one of them is that one has a high degree of confidence in the state of the system. I think saying that passing different Ignition configs is fully user error isn't in line with our philosophy - while it's true it's unlikely users will do this intentionally in most cases, it could lead to really difficult to debug problems.

Exactly the same sort of difficult to debug problems that can arise with more "stateful" systems like traditional package managers and config systems like Ansible/Puppet where you can much more easily have "leftovers" after reconfiguring things (but on the flip side, they aim to support reconfiguration).

So I'm arguing that silently allowing partial Ignition runs isn't in line with our goals. We should e.g. fetch the config into /boot, and then at next fetch time if we detect we have a different SHA256, we error out. (Or alternatively, don't fetch if we have a config in /boot, which would help us for OpenShift on Azure due to gyrations with the checkin)
And, as this patch is trying to do, fatally error out if we detect we were interrupted in the middle of a possibly non-idempotent operation.

Now...one thing I would like to do is actually commit the fetched Ignition config at least for /etc into OSTree - see also openshift/machine-config-operator#1190 - if we did that, we could much more easily make append to an extant file transactional (always start appending to the /usr/etc copy, not /etc). And there shouldn't be any files in /var at Ignition time, so we don't need to worry about append for that.

Making disks idempotent seems really hard in the general case, probably easiest to write a flag file (like this PR is doing) at least during that operation.

@cgwalters
Copy link
Member Author

For one thing, it invalidates much of the point of the automatic 5-minute reboot timer.

Right...hm, why do we do that? Looks like that came from
fe96725
which references CL, so...
coreos/bootengine@185bde0#diff-9d5ebf142ff6d31d26688af8b5a4199f

This should prevent systems from being stuck forever at a prompt if there's a boot failure in the initramfs.

But what's the value in rebooting? In many scenarios we'd just end up in a reboot loop, can't see how that's better.

@bgilbert
Copy link
Contributor

We should e.g. fetch the config into /boot, and then at next fetch time if we detect we have a different SHA256, we error out. (Or alternatively, don't fetch if we have a config in /boot, which would help us for OpenShift on Azure due to gyrations with the checkin)

Works for me.

Now...one thing I would like to do is actually commit the fetched Ignition config at least for /etc into OSTree - see also openshift/machine-config-operator#1190 - if we did that, we could much more easily make append to an extant file transactional (always start appending to the /usr/etc copy, not /etc). And there shouldn't be any files in /var at Ignition time, so we don't need to worry about append for that.

I'm inclined to agree that we should store the original Ignition config somewhere for posterity. But Ignition can run on non-ostree systems, so I'd rather not see special casing there.

For one thing, it invalidates much of the point of the automatic 5-minute reboot timer.

Right...hm, why do we do that?

I had that backwards. The timer was solving a different problem: automatic rollback to the previous OS version if boot failed. It happens to address provisioning failures due to e.g. transient server outages, but that wasn't the goal.

@dustymabe
Copy link
Member

Now...one thing I would like to do is actually commit the fetched Ignition config at least for /etc into OSTree - see also openshift/machine-config-operator#1190 - if we did that, we could much more easily make append to an extant file transactional (always start appending to the /usr/etc copy, not /etc). And there shouldn't be any files in /var at Ignition time, so we don't need to worry about append for that.

I'm inclined to agree that we should store the original Ignition config somewhere for posterity. But Ignition can run on non-ostree systems, so I'd rather not see special casing there.

There's an existing feature request for something similar I think: coreos/ignition#903

@cgwalters
Copy link
Member Author

But Ignition can run on non-ostree systems, so I'd rather not see special casing there.

Right, we already have one non-ostree Ignition consumer. But while I think we all agree it's a good principle to keep technologies/features orthogonal where possible, at some point they do need to be wired together into a whole, and some of those interconnects can be quite valuable.

(Committing the expanded ignition to ostree would be like 5 lines of code probably anyways)

@bgilbert
Copy link
Contributor

Ignition doesn't need special rpm-ostree support to write out the rendered config somewhere persistent. I don't think rpm-ostree integration is needed to solve the idempotent-append problem either, and it wouldn't help other distros in any event.

Let's continue discussion on the latter topic in coreos/ignition#642.

I have seen in practice us trying to run Ignition multiple times
if the system is forcibly rebooted before we run the firstboot
complete service, which actually today runs in the real rootfs.

Forcibly prevent this by adding a few tiny new shell scripts
which write a state file to `/boot/ignition.state`.  The first
to run is `ignition-prelaunch.service` which runs even before
`fetch` and bombs out directly if we detect the stamp file
exists.

Next, we have a service which writes the stamp file after `fetch`
and before `disks`.  Finally, another service updates the stamp
file just to signal `complete`.

Note none of these services run on live systems, as there's
no `/boot` there, and Ignition runs every boot anyways.
@cgwalters cgwalters force-pushed the firstboot-fail-if-interrupted branch from 0fb5564 to 7fcbe24 Compare March 24, 2020 17:29
@cgwalters
Copy link
Member Author

Rebased 🏄‍♂️
This one got sidetracked on other discussions, but I think the core idea here still makes sense.

@cgwalters
Copy link
Member Author

This probably needs to use -diskful.target now too.

@darkmuggle
Copy link
Contributor

@cgwalters what would you like to do with this PR? Since we now have ignition-dracut in ignition it needs to be rebased on ignition or closed.

@bgilbert
Copy link
Contributor

Closing this out, since this repository is abandoned. Please reopen against coreos/ignition if still relevant.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants