-
Notifications
You must be signed in to change notification settings - Fork 27
Enforce that Ignition run once even if interrupted #137
Conversation
We are planning to move the firstboot file deletion to the initramfs, wouldn't that also solve this issue? |
The problem I'm trying to solve here is more about handling the case where we're rebooted during e.g. the disks or files stages. Potentially, if a different config is provided on the subsequent boot, we could silently get part of the first config, and the rest of the second! This is trying to make it an explicit fatal error if we're interrupted - moving deletion of the firstboot file to the initramfs doesn't help that, right? Because if we unlink it before we make changes, then if we're interrupted after unlink but before Ignition completes, we still have a partially configured system - and this wouldn't be obvious to admins! If we unlink it after, we have the same problem this is solving. |
In general, Ignition runs should be idempotent. (For a case where we fail at that, see coreos/ignition#642). There are error cases [1] which fail to an emergency shell, eventually reboot, and then intentionally try again. This change breaks that high-level recovery process. The situation you've described, where a different config is provided on a subsequent boot, we've generally considered to be user error. If we're concerned about that case, I'd rather see us cache the rendered config and reuse it on subsequent attempts. [1] For example, a transient server outage while fetching a remote resource in the files stage. |
I agree that'd be nice; however, in the general case with anything "destructive" such as e.g. rootfs reprovisioning support (and your linked example of append) it's going to be hard to achieve. We hit this with RHCOS and firstboot encryption - trying to make that process idempotent is possible, but not easy.
Yeah, it has come up before to cache the config in |
Doesn't Ignition already retry on transient errors? I'm not a fan of "reboot and retry everything" as part of the strategy for handling transient errors halfway through (we should be retrying/handling those on the spot). I think maintaining the stance of "Ignition runs only once" is more important. (One exception we can make is after That said, I think we should still allow forcing Ignition to rerun via One thing we could do here is only error out if both |
Something I was thinking about was whether we could just use the Ignition config itself as the stamp file instead of a separate |
Indeed, the implementation in this PR only writes the "starting ignition" stamp file after |
Idempotency is implied by declarative provisioning, which is a design principle of Ignition. Technically, rootfs reprovisioning and firstboot encryption are outside the scope of Ignition proper, so those observations aren't incompatible. 🙂 In really destructive cases where we can't rerun, it might make sense to add an interlock as you do here. But I don't think we should do that in general. For one thing, it invalidates much of the point of the automatic 5-minute reboot timer.
Some number of times, but not indefinitely.
Historically that stance has meant that users can't rerun Ignition to produce a second set of user-visible effects. It has not meant that we don't rerun Ignition with the same config after a failure produces a partially-applied config, because idempotency implies that doing so should be completely safe. You're right that we haven't been clear about the distinction. @ajeddeloh, any thoughts? |
I think a lot of the point of what we're doing with our technologies is to move more closely towards "image based"/"declarative" systems - commonly having a single "source of truth" for the state of components. OSTree gives you a cryptographic checksum for the OS content, for which it is the source of truth. Similarly Ignition also has a checksum for its final fetched config and the intention is Ignition defines your whole There are various benefits to moving to models like this - one of them is that one has a high degree of confidence in the state of the system. I think saying that passing different Ignition configs is fully user error isn't in line with our philosophy - while it's true it's unlikely users will do this intentionally in most cases, it could lead to really difficult to debug problems. Exactly the same sort of difficult to debug problems that can arise with more "stateful" systems like traditional package managers and config systems like Ansible/Puppet where you can much more easily have "leftovers" after reconfiguring things (but on the flip side, they aim to support reconfiguration). So I'm arguing that silently allowing partial Ignition runs isn't in line with our goals. We should e.g. fetch the config into Now...one thing I would like to do is actually commit the fetched Ignition config at least for Making disks idempotent seems really hard in the general case, probably easiest to write a flag file (like this PR is doing) at least during that operation. |
Right...hm, why do we do that? Looks like that came from
But what's the value in rebooting? In many scenarios we'd just end up in a reboot loop, can't see how that's better. |
Works for me.
I'm inclined to agree that we should store the original Ignition config somewhere for posterity. But Ignition can run on non-ostree systems, so I'd rather not see special casing there.
I had that backwards. The timer was solving a different problem: automatic rollback to the previous OS version if boot failed. It happens to address provisioning failures due to e.g. transient server outages, but that wasn't the goal. |
There's an existing feature request for something similar I think: coreos/ignition#903 |
Right, we already have one non-ostree Ignition consumer. But while I think we all agree it's a good principle to keep technologies/features orthogonal where possible, at some point they do need to be wired together into a whole, and some of those interconnects can be quite valuable. (Committing the expanded ignition to ostree would be like 5 lines of code probably anyways) |
Ignition doesn't need special rpm-ostree support to write out the rendered config somewhere persistent. I don't think rpm-ostree integration is needed to solve the idempotent-append problem either, and it wouldn't help other distros in any event. Let's continue discussion on the latter topic in coreos/ignition#642. |
I have seen in practice us trying to run Ignition multiple times if the system is forcibly rebooted before we run the firstboot complete service, which actually today runs in the real rootfs. Forcibly prevent this by adding a few tiny new shell scripts which write a state file to `/boot/ignition.state`. The first to run is `ignition-prelaunch.service` which runs even before `fetch` and bombs out directly if we detect the stamp file exists. Next, we have a service which writes the stamp file after `fetch` and before `disks`. Finally, another service updates the stamp file just to signal `complete`. Note none of these services run on live systems, as there's no `/boot` there, and Ignition runs every boot anyways.
0fb5564
to
7fcbe24
Compare
Rebased 🏄♂️ |
This probably needs to use |
@cgwalters what would you like to do with this PR? Since we now have ignition-dracut in ignition it needs to be rebased on ignition or closed. |
Closing this out, since this repository is abandoned. Please reopen against coreos/ignition if still relevant. |
I have seen in practice us trying to run Ignition multiple times
if the system is forcibly rebooted before we run the firstboot
complete service, which actually today runs in the real rootfs.
Forcibly prevent this by adding a few tiny new shell scripts
which write a state file to
/boot/ignition.state
. The firstto run is
ignition-prelaunch.service
which runs even beforefetch
and bombs out directly if we detect the stamp fileexists.
Next, we have a service which writes the stamp file after
fetch
and before
disks
. Finally, another service updates the stampfile just to signal
complete
.Note none of these services run on live systems, as there's
no
/boot
there, and Ignition runs every boot anyways.