-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coreos-bootimage-streams: Standardized CoreOS bootimage metadata #679
coreos-bootimage-streams: Standardized CoreOS bootimage metadata #679
Conversation
8e7c127
to
8d08f9b
Compare
enhancements/coreos-bootimages.md
Outdated
|
||
#### Story 1 | ||
|
||
An OpenShift core developer can write a PR which reads the configmap from the cluster and acts on it to update the machinesets to e.g. use a new AMI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use case raises a question: when I grow a MachineSet, is my cluster required to boot new hosts using the OS image referenced in the configmap? Or can it use an older one? How much older?
I understand that one of your goals (described in #201 and maybe worthwhile to repeat as a goal on this proposal) is to "Avoid accumulation of technical debt across OS updates". If I grow my 4.9 cluster, you don't want me using a 4.2 boot image. Makes sense. But this proposal is just helping my cluster understand what the very latest boot image is for its current cluster version. It does not help to set a floor on the supported boot image version, unless this implies that the cluster must use only the image referenced in the configmap.
Let's say my cluster is not connected to the internet. I've figured out how to mirror openshift releases into my local container registry to facilitate cluster updates. Maybe I even have a local cincinnati deployed. With every cluster update, must I also now download the latest boot image and place it on my local http mirror?
(I assume that somewhere else we'll solve the problem of how the theoretical automation looking at the configmap and updating MachineSets knows about my local OS image mirror and does some sort of translation, similar to how ImageContentSourcePolicies work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is covered by the "Non-goal: Replacing the default in-place update path". So yes, we can't require it anytime soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Reusing this sub-thread to just say thanks @mhrivnak for the thoughtful review!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romfreiman if by "optimization" you mean we might be able to skip the early pivot by having the latest boot ISO, I'm not sure that will work out very often. @cgwalters:
- Is it ever possible that the early pivot can be skipped because the boot image has the same version as the machine-os-content container?
- If "yes", any thoughts on how often that might be the case?
This is just in the context of bare metal where an extra reboot costs up to 15 minutes, plus however long it takes to retrieve and apply the machine-os-content update. So it's desirable to write the correct RHCOS version to disk the first time if possible.
eb61f99
to
0aad02f
Compare
OK I updated this one to call for binary patching the installer, the same way we do the release image, instead of duplicating the bootimages there. That gets openshift/installer entirely out of the |
enhancements/coreos-bootimages.md
Outdated
metadata, change `oc adm release new` to *binary patch* the included `openshift-install` | ||
binary, in the same way we [do for the release image](https://github.com/openshift/installer/pull/1422). | ||
|
||
The RHCOS team will take over maintenance, CI, and automation of `rhel-coreos-bootimages`. For OKD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RHCOS team will take over maintenance, CI, and automation of
rhel-coreos-bootimages
.
What does this entail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To start, we would do PRs to that repository to bump it in the same way we do openshift/installer. See the prototype at https://github.com/cgwalters/rhel-coreos-bootimages
But we could switch to having a bot do PRs after new cosa builds happen, or something else.
enhancements/coreos-bootimages.md
Outdated
|
||
The installer needs CoreOS to start at all; today the bootimages are used for the bootstrap node | ||
and control plane which are created by the installer directly. In order to avoid duplicating the | ||
metadata, change `oc adm release new` to *binary patch* the included `openshift-install` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the binary patch inject the entire stream metadata into the installer? What is the process that a developer (or a libvirt user) running a locally built installer would use? Would they be expected to do a binary patch as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today the installer when built from git master defaults to an OKD release image with rhcos.json
, which just doesn't work at all - this is a common stumbling block for people.
I think probably the simplest thing here is that when building from git branch master
, we default to using oc
to binary patch to the latest release image data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about the statement that it "just doesn't work at all". I frequently do successful installation using the installer built from git master without needing to make any changes for the RHCOS image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you run your locally-built installer, do you provide env OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE
?
What I and others have hit is pretty simple:
The default release image is OKD:
$ oc image info registry.ci.openshift.org/origin/release:4.8 | grep -i release=
Labels: io.openshift.release=4.8.0-0.okd-2021-03-07-094455
$
Yet the default pinned bootimages are RHCOS. That means e.g. the bootstrap node is RHCOS yet we're pulling down the OKD release image which causes various problems, at least in the past.
We added support to streamline overriding the release image to xokdinst to help with this for example.
enhancements/coreos-bootimages.md
Outdated
|
||
This command will extract the CoreOS bootimage stream metadata from a release image and should be used by UPI installs (including our many existing automated tests for UPI). | ||
|
||
Additionally, we will add `oc adm release coreos-download -p openstack` which will e.g. download the OpenStack `.qcow2`, including verifying its integrity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the platform used here going to match the platforms used in the metadata stream or the platform used by OpenShift? For example, if a user is installing on RHV, would the user use the openstack
platform or the ovirt
platform for this command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! The most user-friendly thing is probably to accept OpenShift platforms and map them to stream metadata, so -p rhev
or -p ovirt
maps to openstack
as it does in IPI. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me from a user's standpoint. Will the RHCOS team own that mapping as the installer adds more platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would end up being a small Go library shared between oc
and openshift-install
? Or we could glom it onto an existing library like...hmm, library-go? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sane to me. Will help us on several fronts.
enhancements/coreos-bootimages.md
Outdated
|
||
A new git repository https://github.com/openshift/rhel-coreos-bootimages will be created (based on [an existing prototype repository](https://github.com/cgwalters/rhel-coreos-bootimages)). It will have the stream JSON. | ||
|
||
Additionally, this repository will be included in the release image as `rhel-coreos-bootimages` and it will use the CVO to install a `configmap/coreos-bootimages` in the `openshift-machine-config-operator` namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming wise you should be machine-os-bootimage-config
(and i think your repo should be that name too)
enhancements/coreos-bootimages.md
Outdated
The installer needs CoreOS to start at all; today the bootimages are used for the bootstrap node | ||
and control plane which are created by the installer directly. In order to avoid duplicating the | ||
metadata, change `oc adm release new` to *binary patch* the included `openshift-install` | ||
binary, in the same way we [do for the release image](https://github.com/openshift/installer/pull/1422). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. Patching the installer (like we patch the CLI) is something that is a requirement in our process already, so we wouldn't be worsening it. It's a bit wierd because now to extract the installer we have to extract something from an image BEFORE we extract the installer. The JSON blob here is also significantly larger than a version string, so there is some ugliness there (the way we do patching has some limitations, but probably wouldn't block this change).
One alternative is having the installer fetch the metadata the same way the oc tools do, reusing our libaries (which is no less safe, but does change a bit of the installer flow). That has some advantages because then the installer could verify other properties of the install including whether the credentials the user provides have access to the images, but which would not help when installing into restricted environments where the installer doesn't have access to the things that are pulled at all. I think given that I'd probably rule out this option.
A second alternative is to make the user fetch this data themselves and provide it to the installer. This breaks the hermetic quality of the release, and loses the integrity check we get when we install from the installer (right now the shasum of the installer is enough to verify the entire install chain). Because of the loss of integrity, I'd rule out this alternative as well.
A variant of your original option is to have the metadata about where those come from be part of machine-os-content, either as labels on the image, or a file in machine-os-content, which we extract and place in the payload directly. However, the implication I think from the team is that we would then need to know the final publish locations before building machine-os-content? That might not be insurmountable as a label, but might be higher cost to the release pipeline, and so a separate image is maybe ok.
The last alternative is to keep this in the installer, but to have the installer inject the content as a config map on the cluster. I think of all the options I prefer that the most because it requires no additional change to our release tooling, and requires no additional repo, and since this is purely metadata, I'd prefer the data used by the installer to be colocated with the installer binary.
I'd like to hear arguments against my last alternative, because I prefer it over what is described in this enhancement (maybe I missed a reason why we can't do that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variant of your original option is to have the metadata about where those come from be part of machine-os-content, either as labels on the image, or a file in machine-os-content, which we extract and place in the payload directly. However, the implication I think from the team is that we would then need to know the final publish locations before building machine-os-content?
Yes, there are very messy aspects around the buildsystem to include this in machine-os-content
. But actually another problem is that any time we change machine-os-content
, that forces clusters to update in-place. I recently went for example to do a PR to a release branch that added some extra commentary into a systemd unit we ship, but I stopped when I realized it'd pointlessly force a reboot for customers.
A more recent example is that we have a bug in LUKS for 4.7 - this only requires a bootimage update, there's nothing to change for running nodes.
A pathological case here is when we e.g. just have a fix for vSphere and just want to bump that bootimage.
And more generally I want to try harder to (re)split the bootimage-vs-machine-os-content
in our build system, partly for this reason. The other reason to do the split is we have the converse problem where e.g. we just want to get a kernel/crio/kubelet fix out and that doesn't need new bootimages, but we still make new AMIs for every new kubelet, which is a pointless waste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, there's definitely things we could do to mitigate the "every machine-os-content change rolls the whole cluster for everyone" issue; I'd eventually like to include some sort of metadata around update urgency. There's a huge difference between a kernel CVE update and my new commentary in a systemd unit file.
And we could probably teach the MCO to be more intelligent about this - e.g. right now it just compares image sha256, but we could go back to trying to have it pull the inner ostree out of the image and skip an update if that hasn't changed. (Well, that plus extensions). It'd mean we'd still end up causing existing clusters to pull the m-o-c to at least one node, and then skip doing anything - wouldn't be too bad. But it starts to grow nontrivial logic in the MCO and we're near capacity there for nontrivial changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A somewhat less good (but quite important to us tactically right now) reason is we're in the midst of a transition of trying to have our CI and build be more Prow native, - but getting machine-os-content
to work that way all at once is unlikely to happen for 4.8. But with a new component we can add this new git repository now that would have (just like openshift/installer does) Prow jobs that gate it and we'd suddenly gain for free a ton of OpenShift CI integration that doesn't exist today in our builds until we go to do a PR to openshift/installer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last alternative is to keep this in the installer, but to have the installer inject the content as a config map on the cluster. I think of all the options I prefer that the most because it requires no additional change to our release tooling, and requires no additional repo, and since this is purely metadata, I'd prefer the data used by the installer to be colocated with the installer binary.
Oh sorry I missed the nuance of this and spent a lot of time replying to the m-o-c bit but here's what you were really arguing! Sorry.
OK yes, in fact that's exactly what I was working on in openshift/installer#4582 but the reason I pivoted to this is because it gets messy to go from there to solving the "how do UPI installs consume this" problem.
And it turns out there are a lot of "automated UPI testing" jobs out there.
Would we add something like openshift-install print-coreos-stream-metadata
that dumps the JSON?
Should UPI (including our automated UPI tests) fetch the metadata from https://mirror.openshift.com
instead?
We're kind of here because openshift/installer#2092 was rejected and the installer team lead at the time was strongly of the opinion this should be part of the release image and oc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MCO, m-o-c is the source of truth today. About teaching MCO to skip non-trivial OS update, I think one way would be to add a label (like update:true) to m-o-c container image that MCD reads and considers as no-op. Thinking about this more, my idea is not really going to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clayton and I had a fruitful discussion around his last alternative. I am in agreement with him in preference of that alternative.
A configmap containing the metadata stream would be added as a manifest in the installer image. This configmap would be injected into the cluster in the normal way for image manifests during installs and upgrades.
There would be a command added to openshift-install
that a user could run to get the stream metadata, for use cases such as UPI installs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also of note in the details of the last alternative is that the installer would include the stream metadata configmap in its binary so that the stream metadata used at install-time matches what is included in the release image.
One angle we could actually take this is to decide that this new git repository will actually become an operator itself in the future, based on the code from https://gitlab.com/cgwalters/openshift-update-bootimages I don't think we should scope in trying to determine its API right now - perhaps it wouldn't even have an API, it would just by default in IPI scenarios detect the case when a cluster is upgraded, it'd automatically update machinesets to the latest bootimage for workers. Then all we'd need is the MCO firstboot to gain some logic for "do I actually need to OS upgrade and reboot" and we'd gain the scaleup benefits. |
|
||
### Goal 1: Including metadata in the cluster so that we can later write automated IPI/machineAPI updates | ||
|
||
Lay the groundwork so that one or both of the MCO or machineAPI operators can act on this data, and e.g. in an AWS cluster update the machineset to use a new base AMI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to understand what sort of changes MCO would need to do? bootimages today are ideally handled by machine-api, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This topic bridges the two I think. Indeed the machineAPI naturally would be a good place to update machinesets. But on the flip side, this stuff is completely CoreOS specific (like a lot of the MCO), whereas the machineAPI is a lot more generic (because it's forked from the upstream generic Kube cluster API).
Also if the goal is to use this to skip the firstboot os upgrade, we would want some careful coordination around analyzing he diff between machine-os-content and the bootimages to ensure we don't actually need to do the in-place update. IOW there's no point to doing the machineset update if we still need to firstboot reboot - so that's MCO knowledge.
Or as I commented below, perhaps this is even a new (small) operator. I don't have a really strong opinion to be clear. This enhancement just calls for injecting the data so we can later do that work.
(But, the configmap is placed in the openshift-machine-config-operator
namespace; if you guys object to that or want it changed please say so!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if the goal is to use this to skip the firstboot os upgrade, we would want some careful coordination around analyzing he diff between machine-os-content and the bootimages to ensure we don't actually need to do the in-place update. IOW there's no point to doing the machineset update if we still need to firstboot reboot - so that's MCO knowledge.
agree with making MCO smarter here to look at OSTree content rather than just checking the OSImageURL.
This enhancement just calls for injecting the data so we can later do that work.
right
(But, the configmap is placed in the
openshift-machine-config-operator
namespace; if you guys object to that or want it changed please say so!)
either machineAPI or MCO namespace sounds reasonable place for this. Based on your reasoning about MCO over machineAPI, I don't see any issue here.
OK there is a smaller fallback plan which is basically: Keep going with openshift/installer#4582 except also add code to "back-convert" the stream metadata to EDIT: An even quicker hack is to simply duplicate the metadata. That's what openshift/installer#4582 is doing as of openshift/installer@c644fe7 |
This is a preparatory subset of the larger enhancement for [in-cluster CoreOS bootimage creation](openshift#201). This enhancement calls for a standardized JSON format for (RHEL) CoreOS bootimage metadata to be placed at https://mirror.openshift.com *and* included in a new `rhel-coreos-bootimages` image included in the release image.
0aad02f
to
15c7112
Compare
OK I reworked this to reflect the current state of openshift/installer#4760 (and its dependency openshift/installer#4582 ) - basically we continue to have the installer be "source of truth" and have it install the configmap rather than introducing a new git repository + binary patching. We also agreed to add a new CLI command to |
|
||
See above - this JSON file will be available in multiple ways for UPI installations. | ||
|
||
### Goal 3: Provide a standardized JSON file for bare metal IPI and baremetal devscripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for the assisted installer.
@cgwalters WDYT about adding similar command to oc adm? So the user can have a single flow for mirroring?
|
https://docs.fedoraproject.org/en-US/fedora-coreos/stream-metadata/ is for FCOS but will hopefully be a useful reference for RHCOS too; in particular it contains a link to https://github.com/coreos/stream-metadata-go for example. |
Definitely I think we should leverage this to improve the mirroring case even more. My view is that Having higher level tooling in I think this leads to having the higher level sugar be e.g. But that said, once we have that we could add another level like |
@staebler mind adding an approve/lgtm for this since it's now almost entirely an enhancement for openshift/installer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@cgwalters I put my lgtm, but I do not have approval rights. You'll have to find someone else for that. |
/approve Thanks for getting this sorted. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton, staebler, travier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
OK there's some active debate on this happening in private chat, trying to make this public here: One thread I think we did lose here is having something like But either way here's a strawman:
|
Also related to disconnected installs - I don't have much experience with Hive but I would assume it's pretty oriented towards the release image and extracts the installer binary from it? In that case then indeed exposing an |
Agreed. The admin also needs to mirror an ISO, so it makes sense to use the same tool to determine which ISO to mirror.
@dgoodwin how does hive extract the installer in disconnected environments? For a similar agent-based use case, we're waiting on this enhancement to land to enable us to extract the appropriate installer binary. |
I believe we run a pod using the release image and a custom command / sidecar container which extracts the installer image ref from the /release-manifests/image-references. We then launch the installer pod using some of those image refs. (installer/cli/etc) |
Though will Hive ever care about the bootimage? All Hive is IPI right? So having an interface for this in |
Bootimage aka RHCOS image? We are integrating with the Agent/Assisted installer so I could easily see a world where we also extract the RHCOS image to provide to the Agent installer. |
Today a disconnected hive user has to:
Since the RHCOS image is not included as part of the payload (and isn't planned to be AFAIK), there's not much hive can do to simplify the above, unless we add a new feature that tracks a collection of available RHCOS boot images and tries to match the best one with a particular release image at install time. |
OK then right, indeed you could use it. But, would having the data in |
Release image would be the best fit for how we do things today, oc and openshift-install may be workable but will take some effort. |
Reusing this as a handy place to answer a semi-FAQ: Q: "Will this allow skipping the initial os update + reboot"? There's a whole chain of stuff that needs to happen for that; first we need to productize https://gitlab.com/cgwalters/openshift-update-bootimages and pick a place to ship it. Then, the next problem is that the RHCOS build pipeline isn't producing new bootimages for every update; when a new kernel CVE comes out we don't want to require regenerating everything - we just ship a new oscontainer. So it's going to happen by default that e.g. the AMIs in the metadata are old. Now, we could change that, but when we do that we run into a much deeper problem and this is where #201 is - if we do that it won't apply to on-premise installs (OpenStack, vSphere, bare metal) until we land code somewhere that contains code to interact with the underlying infra to update the bootimage. And doing this by default in public cloud but not on premise risks creating a large feature divergence. |
In openshift/enhancements#679 we landed support for a stream metadata format already used by FCOS and now by RHCOS/OCP consumers to find the bootimages. We kept the old metadata because the UPI CI jobs used it. In openshift/release#17482 I tried to port the UPI CI jobs, but ran into huge levels of technical debt. As best I can tell, the UPI CI jobs are not running on this repo now and are likely broken for other reasons. Let's remove the old data and avoid the confusing duplication. Anyone who goes to work on the UPI CI jobs and sanitizes things should be able to pick up the work to port to stream metadata relatively easily.
In openshift/enhancements#679 we landed support for a stream metadata format already used by FCOS and now by RHCOS/OCP consumers to find the bootimages. We kept the old metadata because the UPI CI jobs used it. In openshift/release#17482 I tried to port the UPI CI jobs, but ran into huge levels of technical debt. As best I can tell, the UPI CI jobs are not running on this repo now and are likely broken for other reasons. Let's remove the old data and avoid the confusing duplication. Anyone who goes to work on the UPI CI jobs and sanitizes things should be able to pick up the work to port to stream metadata relatively easily.
This is a preparatory subset of the larger enhancement for in-cluster CoreOS bootimage creation.
This enhancement calls for a standardized JSON format for (RHEL) CoreOS bootimage metadata to be placed at https://mirror.openshift.com and included in a new
rhel-coreos-bootimages
image included in the release image.