-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
40ignition-ostree: add autosave-xfs transposefs unit #2320
Conversation
# Semi-arbitrarily chosen: this is roughly ~64G currently (based on initial | ||
# ag sizing at build time) which seems like a good rootfs size at which to | ||
# discriminate between "throwaway/short-lived systems" and "long-running | ||
# workload systems". It's not like XFS performance is way worse at 128. | ||
if [ "$agcount" -lt 128 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandeen, ended up going with 128 as a threshold for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave this a read, looks good.
@@ -28,6 +28,8 @@ install() { | |||
systemd-sysusers \ | |||
systemd-tmpfiles \ | |||
sort \ | |||
xfs_info \ | |||
xfs_spaceman \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧑🚀
7cacd47
to
fe72963
Compare
Rebased for conflicts now! With FCOS releases freshly out, I think we can get this in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM - some suggestions
overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-growfs.sh
Show resolved
Hide resolved
overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh
Outdated
Show resolved
Hide resolved
overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh
Outdated
Show resolved
Hide resolved
Prep for automatic XFS reprovisioning. The only way to know for sure whether the rootfs should be reprovisioned is analyzing it after the filesystem was grown. We could do calculations beforehand, but it'd get complex having to analyze the partition table. Anyway, the partition growing and e.g. LUKS container resizing need to happen before automatic reprovisioning and ignition-ostree-growfs already knows how to do that.
No functional change. Prep for future patch.
Currently, this code only executes via Ignition reprovisioning the rootfs, but we're about to add code to reprovision the rootfs outside of that path. In that case, we don't need to query the Ignition config.
Add a new transposefs unit: `autosave-xfs`. This unit runs after `ignition-disks` and `ignition-ostree-growfs,` but before the `restore` transposefs unit. If the XFS root was grown, it checks if the allocation group count (agcount) is within a reasonable amount (128 is chosen here). If it isn't, it saves the rootfs and reformats the filesystem. The `restore` unit will then restore it as usual. In the case of in-place reprovisioning like LUKS (i.e. where the partition table isn't modified by the Ignition config), the rootfs is still saved only once. Ideally, instead of adding a new transposefs unit, we would make it part of the initial `save` unit. But at that point, there's no way to tell whether we should autosave without gazing even more deeply into the Ignition config. We also don't want to unconditionally save the rootfs when we may not need it. Closes: coreos/fedora-coreos-tracker#1183
We weren't checking anywhere in the non-reprovisioning case that we grow the root filesystem on first boot. Add a trivial test for this.
Prep for adding another LUKS where we want the same checks.
Add a new `ext.config.root-reprovision.autosave-xfs` test that checks that the logic kicks in on a large enough disk. Add a similar `ext.config.root-reprovision.luks.autosave-xfs` for the LUKS version of this. Sanity-check in other reprovisioning tests that autosave-xfs didn't kick in.
fe72963
to
2a3c4aa
Compare
Updated for comments! |
@@ -202,6 +244,23 @@ case "${1:-}" in | |||
mkdir "${saved_prep}" | |||
fi | |||
;; | |||
autosave-xfs) | |||
should_autosave=$(should_autosave_rootfs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so (by design) now the script will exit here if an unexpected error occurs (because of set -euo pipefail
) and that is how we'll handle the cases we weren't handling before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, exactly. It's no longer part of an if-statement, so the errexit should be respected.
@@ -202,6 +244,23 @@ case "${1:-}" in | |||
mkdir "${saved_prep}" | |||
fi | |||
;; | |||
autosave-xfs) | |||
should_autosave=$(should_autosave_rootfs) | |||
if [ "${should_autosave}" = "1" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [ "${should_autosave}" = "1" ]; then | |
if [ "${should_autosave}" == "1" ]; then |
It always throws me off that a single =
works.
You don't need to change this, I'm just calling it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Downstream bug https://issues.redhat.com/browse/OCPBUGS-16157 |
This was added to force a reprovision of the root filesystem on a particular instance we were using for a RHCOS builder, but it's no longer needed. Now when a system has a disk > 100G the root filesystem will get reprovisioned automatically by the work done in coreos/fedora-coreos-config#2320
This was added to force a reprovision of the root filesystem on a particular instance we were using for a RHCOS builder, but it's no longer needed. Now when a system has a disk > 100G the root filesystem will get reprovisioned automatically by the work done in coreos/fedora-coreos-config#2320
Further fallout from this in https://issues.redhat.com/browse/OCPBUGS-16724 |
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. First, rework the reprovision threshold to operate in terms of disk size, which is much easier to explain and debug than allocation group count. (Which to be clear, *is* the real problem, but disk size is a good enough proxy for this) Then, bump the reprovision threshold to 1TiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo.
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. First, rework the reprovision threshold to operate in terms of disk size, which is much easier to explain and debug than allocation group count. (Which to be clear, *is* the real problem, but disk size is a good enough proxy for this) Then, bump the reprovision threshold to 1TiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo.
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes.
Followup PR in #2565 |
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes.
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes.
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable
The change in #2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable (cherry picked from commit 4faba4f)
The change in #2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable (cherry picked from commit 4faba4f)
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable
The change in coreos#2320 has been very problematic for OpenShift because our default node configuration is *always* over the threshold, and that causes significant latency on instance provisioning. Experimentally bumping to 400 allocation groups, which is about 700GiB. This is comfortably about the default OpenShift node root disk sizes, and returns us to the prior status quo. While we're here, rework the logging a bit so that we *always* log the `agcount` for debugging purposes. Also: - Only log to stdout for normal conditions - Include the name of the systemd unit in the test description so we can cross-reference - tests: Hoist the expected agcount of 4 to a common variable
Add a new transposefs unit:
autosave-xfs
. This unit runs afterignition-disks
andignition-ostree-growfs,
but before therestore
transposefs unit.
If the XFS root was grown, it checks if the allocation group count
(agcount) is within a reasonable amount (128 is chosen here). If
it isn't, it saves the rootfs and reformats the filesystem. The
restore
unit will then restore it as usual. In the case of in-placereprovisioning like LUKS (i.e. where the partition table isn't modified
by the Ignition config), the rootfs is still saved only once.
Ideally, instead of adding a new transposefs unit, we would make it
part of the initial
save
unit. But at that point, there's no way totell whether we should autosave without gazing even more deeply into the
Ignition config. We also don't want to unconditionally save the rootfs
when we may not need it.
Closes: coreos/fedora-coreos-tracker#1183