Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the ESP by creating a tmpdir and RENAME_EXCHANGE #454

Closed
martinezjavier opened this issue Mar 30, 2023 · 11 comments · Fixed by #669
Closed

Update the ESP by creating a tmpdir and RENAME_EXCHANGE #454

martinezjavier opened this issue Mar 30, 2023 · 11 comments · Fixed by #669
Assignees
Labels
enhancement New feature or request jira For syncing with Jira triaged This issue was triaged

Comments

@martinezjavier
Copy link

Since Linux v6.0 the vfat filesystem has support for renameat2(..., RENAME_EXCHANGE):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da87e1725ae2

So bootupd could instead of applying the diff files one by one to the destination dir, it could create a temporary dir that is a copy of the existing ESP, apply the diff to that temp dir and finally do an atomic rename exchange of the two directories.

That way, the update mechanism will be safer since it would only require a single renameat2() system call.

@martinezjavier martinezjavier changed the title Make update the ESP by creating a tmpdir and renameat2(..., RENAME_EXCHANGE) Update the ESP by creating a tmpdir and RENAME_EXCHANGE to make it atomically Mar 30, 2023
@martinezjavier martinezjavier changed the title Update the ESP by creating a tmpdir and RENAME_EXCHANGE to make it atomically Update the ESP by creating a tmpdir and RENAME_EXCHANGE Mar 30, 2023
@martinezjavier
Copy link
Author

openat::Dir already has a local_rename that seems to support this: https://docs.rs/openat/latest/openat/struct.Dir.html#method.local_rename

@cgwalters
Copy link
Member

The cost here is write amplification; because FAT doesn't have reflinks, every update to the ESP would require rewriting every file, and a transient doubling of disk space.

@travier
Copy link
Member

travier commented Jun 19, 2024

It looks like local_exchange does what we want in Rust:

It calls renameat2 with RENAME_EXCHANGE.

@travier
Copy link
Member

travier commented Jun 19, 2024

So the logic would be:

  • cp -a fedora .fedora.tmp
    • We start with a copy to make sure to keep all other files that we do not explicitly track in bootupd
  • Update the content of .fedora.tmp with the new binaries
  • Exchange .fedora.tmp -> fedora
  • Remove now "old" .fedora.tmp

HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 19, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 19, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 19, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 19, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 19, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 20, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 20, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 20, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 20, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 20, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 20, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 20, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 21, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 21, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 21, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jun 21, 2024
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 2, 2024
See Timothée's comment coreos#454 (comment)
logic is like this:
- `cp -a fedora fedora.tmp`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `fedora.tmp` with the new binaries
- Exchange `fedora.tmp` -> `fedora`
- Remove now "old" `fedora.tmp`

If we have a file not in a directory in `EFI`, then we can copy
it to `foo.tmp` and then act on it and finally rename it. No
need to copy the entire `EFI`.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 2, 2024
See Timothée's comment coreos#454 (comment)
logic is like this:
- `cp -a fedora fedora.tmp`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `fedora.tmp` with the new binaries
- Exchange `fedora.tmp` -> `fedora`
- Remove now "old" `fedora.tmp`

If we have a file not in a directory in `EFI`, then we can copy
it to `foo.tmp` and then act on it and finally rename it. No
need to copy the entire `EFI`.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 2, 2024
See Timothée's comment coreos#454 (comment)
logic is like this:
- `cp -a fedora fedora.tmp`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `fedora.tmp` with the new binaries
- Exchange `fedora.tmp` -> `fedora`
- Remove now "old" `fedora.tmp`

If we have a file not in a directory in `EFI`, then we can copy
it to `foo.tmp` and then act on it and finally rename it. No
need to copy the entire `EFI`.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 4, 2024
See Timothée's comment coreos#454 (comment)
logic is like this:
- `cp -a fedora fedora.tmp`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `fedora.tmp` with the new binaries
- Exchange `fedora.tmp` -> `fedora`
- Remove now "old" `fedora.tmp`

If we have a file not in a directory in `EFI`, then we can copy
it to `foo.tmp` and then act on it and finally rename it. No
need to copy the entire `EFI`.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 8, 2024
See Timothée's comment coreos#454 (comment)
logic is like this:
- `cp -a fedora fedora.tmp`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `fedora.tmp` with the new binaries
- Exchange `fedora.tmp` -> `fedora`
- Remove now "old" `fedora.tmp`

If we have a file not in a directory in `EFI`, then we can copy
it to `foo.tmp` and then act on it and finally rename it. No
need to copy the entire `EFI`.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 8, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 8, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 9, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 9, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 9, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 9, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 11, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 12, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 17, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Jul 18, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
HuijingHei added a commit to HuijingHei/bootupd that referenced this issue Aug 1, 2024
See Timothée's comment coreos#454 (comment)
Reuse `TMP_PREFIX`, logic is like this:
- `cp -a fedora .btmp.fedora`
  - We start with a copy to make sure to keep all other files
that we do not explicitly track in bootupd
- Update the content of `.btmp.fedora` with the new binaries
- Exchange `.btmp.fedora` -> `fedora`
- Remove now "old" `.btmp.fedora`

If we have a file not in a directory in `EFI`, then we can copy
it to `.btmp.foo` and then act on it and finally rename it. No
need to copy the entire `EFI`.

And use `insert()` instead of `push()` to match `starts_with()`
when scanning temp files & dirs.
Fixes coreos#454
@cgwalters
Copy link
Member

One thing I wanted to note here that's pretty important is: #669 (comment)

Basically the PR we landed made this "safer" but still not transactional.

@HuijingHei
Copy link
Member

One thing I wanted to note here that's pretty important is: #669 (comment)

Basically the PR we landed made this "safer" but still not transactional.

Could you help to create a new issue for this and it would be better to add more instructions / pointers ?

@travier
Copy link
Member

travier commented Nov 6, 2024

One thing I wanted to note here that's pretty important is: #669 (comment)

Basically the PR we landed made this "safer" but still not transactional.

It's only not transactional in the case where there would be a GRUB update that also depends on a shim update. And as far I know, this remains a theoretical concern. Any independent shim or GRUB update should be fine.

It's indeed an issue if a shim update, which is applied first (BOOT then fedora) requires GRUB to be updated as well, and the update fails in the middle of the two atomic renames.

For example if a signing key is removed from shim and thus the GRUB bootloader has to be signed by a newer key. What happend in Fedora was that GRUB was signed by both the old and new key for a while, so a failing update would not have caused breakage.

Only systems that would move directly from an release without the new key to a release with only the new key would be impacted.

Has such events are rare (signing key rotation in shim), I think it's reasonable to expect that they are covered by other means (barrier updates in Fedora CoreOS for example).

@cgwalters
Copy link
Member

Has such events are rare (signing key rotation in shim), I think it's reasonable to expect that they are covered by other means (barrier updates in Fedora CoreOS for example).

That's just it though, the barriers are I think as you know a perfect example of something that only applies to FCOS and just heavily conflicts with containers as Source of Truth.

But yes. Hmmm....maybe bootupd could learn to distinguish these cases.

I guess the action item here is to update the README.md to more precisely describe the safety.

@travier
Copy link
Member

travier commented Dec 5, 2024

Coming back to this, I don't think my previous assessment was correct.

What I had not realized is that we have shim & grub in the same folder in the ESP, so given that we now update using an atomic folder switch, they will always be updated together at the same time atomically:

# tree /boot/efi/EFI
.
├── BOOT
│   ├── BOOTX64.EFI
│   └── fbx64.efi
└── fedora
    ├── bootuuid.cfg
    ├── BOOTX64.CSV
    ├── fw
    ├── fwupdx64.efi
    ├── grub.cfg
    ├── grubia32.efi
    ├── grubx64.efi
    ├── mmx64.efi
    └── shimx64.efi

The boot entries in the EFI firmware always point to the one in the fedora folder:

$ efibootmgr
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,...
Boot0000* ...
Boot0001* ...
Boot0002* Fedora Linux  HD(1,GPT,...)/\EFI\fedora\shimx64.efi

Thus if an update fails midway, the content of the fedora folder is always consistent and never partially updated.

I now think that the only case where such an update could result to a failing system is if

  • all the entries in the firmware are removed, forcing the firmware to boot from the fallback path, booting /EFI/BOOT/BOOTX64.EFI
  • the update gets aborted midway and BOOT is updated but not fedora or the reverse

Then this gets into how BOOTX64.EFI actually works and how it sets things up for booting again. From memory, it re-creates a firmware entry and sets it as BootNext or the new default and then reboots (? TBC). If that's the case, then having this binary not be updated in sync with the rest does not matter.

If it then directly EFI chainboot(?) to the GRUB in the fedora folder, then we could get a boot failure.

Overall, I think at this point this is very unlikely to ever happen and we should focus on #440 which would help the case where the new shim binary fails to boot.

@HuijingHei
Copy link
Member

Not sure my understanding is correct, if update fails midway, for example fedora directory is not updated, we might boot with old shimx64.efi and new kernel, it might have a risk to boot failed?

@travier
Copy link
Member

travier commented Dec 13, 2024

Yes, as we only update the bootloader on boot right now, if an update both introduces a new signing key for the kernel in shim and removes the old key then the system would fail to boot the new deployment as it would not have the new key yet. But the previous deployment should still be bootable so overall the system is not completely broken.

To account for that case, we should try to update the bootloader from the pending deployment during an update or at finalization time. This is tracked in #108 and relates to #440.

But note that fully switching the signing key in one update means that it would break booting the previous (current) deployment as well, so there should be a period of overlap where both keys are available in shim.

travier pushed a commit to travier/bootupd that referenced this issue Dec 18, 2024
Add a systemd service unit to trigger an adoption and update on every
boot.

Note that the service is intentionally not enabled by default as it
should be up to the distribution to add a systemd preset if auto-update
for the bootloader is desired.

This unit does not come with any specific restrictions (i.e. EFI or BIOS
only). For an assesment of the safety of updates as performed by
bootupd, see coreos#454.

Distributiuons should also apply the restrictions (i.e. EFI or BIOS only
for example) as needed as unit files overrides.

Notably, Fedora CoreOS can not yet enable automatic updates until we get
support for the multiple EFI partitions for RAID setups.

See: coreos/fedora-coreos-tracker#1468
See: coreos/fedora-coreos-config#3042

Initial version from: coreos#716
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2332868
travier pushed a commit to travier/bootupd that referenced this issue Dec 18, 2024
Add a systemd service unit to trigger an adoption and update on every
boot.

Note that the service is intentionally not enabled by default as it
should be up to the distribution to add a systemd preset if auto-update
for the bootloader is desired.

This unit does not come with any specific restrictions (i.e. EFI or BIOS
only). For an assesment of the safety of updates as performed by
bootupd, see coreos#454.

Distributiuons should also apply the restrictions (i.e. EFI or BIOS only
for example) as needed as unit files overrides.

Notably, Fedora CoreOS can not yet enable automatic updates until we get
support for the multiple EFI partitions for RAID setups.

See: coreos/fedora-coreos-tracker#1468
See: coreos/fedora-coreos-config#3042

Initial version from: coreos#716
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2332868
travier pushed a commit to travier/bootupd that referenced this issue Dec 18, 2024
Add a systemd service unit to trigger an adoption and update on every
boot.

Note that the service is intentionally not enabled by default as it
should be up to the distribution to add a systemd preset if auto-update
for the bootloader is desired.

This unit does not come with any specific restrictions (i.e. EFI or BIOS
only). For an assesment of the safety of updates as performed by
bootupd, see coreos#454.

Distributiuons should also apply the restrictions (i.e. EFI or BIOS only
for example) as needed as unit files overrides.

Notably, Fedora CoreOS can not yet enable automatic updates until we get
support for the multiple EFI partitions for RAID setups.

See: coreos/fedora-coreos-tracker#1468
See: coreos/fedora-coreos-config#3042

Initial version from: coreos#716
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2332868
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request jira For syncing with Jira triaged This issue was triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants