-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlay node image before bootstrapping if necessary #8742
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
/test e2e-aws-ovn |
/retest |
Sweet! 🎉 Looks like e2e-aws-ovn-heterogeneous failed on a different issue. |
/retest |
/test e2e-agent-compact-ipv4 |
/test ? |
@zaneb: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test e2e-metal-assisted |
data/data/bootstrap/files/usr/local/bin/node-image-pull.sh.template
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/node-image-pull.sh.template
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/etc/systemd/system/node-image-pull.service
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/node-image-pull.sh.template
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/node-image-pull.sh.template
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/node-image-pull.sh.template
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/node-image-pull.sh.template
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/usr/local/bin/node-image-pull.sh.template
Outdated
Show resolved
Hide resolved
data/data/bootstrap/files/etc/systemd/system/node-image-pivot.target
Outdated
Show resolved
Hide resolved
Just surfacing some insights from internal discussions here. The AI test here is not using the ISO from The ABI tests here are not using the ISO from the |
It's odd to me that |
It's tightly coupled in ART builds because machine-os-images depends on the installer image. But upstream CI doesn't have any notion of dependencies between container images. The CI payload just contains the latest master build of every container image. So machine-os-images only gets rebuilt when a PR merges to it, regardless of any changes to the installer repo (let alone unmerged PRs in the installer repo). Arguably we chose the wrong ISO here, and in the event of a conflict between what rhcos.json says and what's in the payload, we should go with downloading the former so that we can CI changes like this. The assumption is that the payload is canonical, but that's not really true in CI, and CI is theoretically the only place they can differ. (I wonder if this would break disconnected CI tests though - @bfournie do you know? It'd also mean we'd be executing different code paths in CI than users are actually getting in reality.) |
This job is not required and expected to fail until #8698 merges. |
9f5f3ee
to
d6b8f87
Compare
OK, updated! Didn't rebase to make the diff easier, but also because I don't expect this to be merged before the associated enhancement is merged. But changes now include:
This works with Assisted Installer as well but requires a patch there. I'll open that one shortly. |
/test e2e-aws-ovn |
d6b8f87
to
fedeaad
Compare
OK, ended up doing a full rebase so try to get CI going. /test e2e-aws-ovn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sane to me
data/data/bootstrap/files/usr/local/bin/node-image-pull.sh.template
Outdated
Show resolved
Hide resolved
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
fedeaad
to
81af230
Compare
Updated for feedback and updated commit and PR title to drop "pivot" wording. |
81af230
to
eb220ec
Compare
/test e2e-aws-ovn |
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
a50dd02
to
4c29e7f
Compare
One update I did here is in the live ISO case, we now nuke the temporary repo completely after the checkout. This saves precious RAM space so we can fit within the 16G minimum requirements for SNO. /test e2e-aws-ovn |
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
golint was complaining about: ``` pkg/asset/ignition/bootstrap/common.go:406:2: ifElseChain: rewrite if-else to switch statement (gocritic) if parentDir == "bin" || parentDir == "dispatcher.d" || parentDir == "system-generators" { ^ ```
These bootimages are RHEL-versioned and do not have any OCP components in them.
4c29e7f
to
e9faa37
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cgwalters The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Rebased this now! Leaving it in draft for now, because I didn't retest things, but feel free to start reviewing. Now that openshift/enhancements#1637 has merged, we will need this. /test e2e-aws-ovn |
OK yup, still working fine! At least, testing in a local VM showed bootstrapping got far enough to tell me that the node overlay was successfully and There's a DNM commit in this PR to ensure that CI tests run against bootimages . But once we're ready to merge this, I'll remove that commit. One key thing to understand is that all the logic added in this PR is a no-op on the current bootimages which do ship OCP components. It only activates on the el-only bootimages. So we'll only be using this logic for real in that future bootimage bump where we decide to move over to the el-only bootimages. |
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
/retest |
@jlebon: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.
This means that there will no longer be
oc
,kubelet
, orcrio
binaries for example, which bootstrapping obviously relies on.
Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its
/usr
,and import new
/etc
content.This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.
This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).
The big exception of course being the kernel. Relatedly, currently
/usr/lib/modules
is also shadowed by the mount, but we could re-mountit if needed.
To be conservative, the new logic only kicks in when using bootimages
which do not have
oc
. This will allow us to ratchet this in moreeasily.
Down the line, we should be able to replace some of this with
bootc apply-live
once that's available (and also works in a liveenvironment). (See containers/bootc#76.)
For full context, see the linked enhancement and discussions there.