Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coreos-bootimage-streams: Standardized CoreOS bootimage metadata #679

Merged

Conversation

cgwalters
Copy link
Member

This is a preparatory subset of the larger enhancement for in-cluster CoreOS bootimage creation.

This enhancement calls for a standardized JSON format for (RHEL) CoreOS bootimage metadata to be placed at https://mirror.openshift.com and included in a new rhel-coreos-bootimages image included in the release image.

enhancements/coreos-bootimages.md Outdated Show resolved Hide resolved
enhancements/coreos-bootimages.md Outdated Show resolved Hide resolved
enhancements/coreos-bootimages.md Show resolved Hide resolved

#### Story 1

An OpenShift core developer can write a PR which reads the configmap from the cluster and acts on it to update the machinesets to e.g. use a new AMI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use case raises a question: when I grow a MachineSet, is my cluster required to boot new hosts using the OS image referenced in the configmap? Or can it use an older one? How much older?

I understand that one of your goals (described in #201 and maybe worthwhile to repeat as a goal on this proposal) is to "Avoid accumulation of technical debt across OS updates". If I grow my 4.9 cluster, you don't want me using a 4.2 boot image. Makes sense. But this proposal is just helping my cluster understand what the very latest boot image is for its current cluster version. It does not help to set a floor on the supported boot image version, unless this implies that the cluster must use only the image referenced in the configmap.

Let's say my cluster is not connected to the internet. I've figured out how to mirror openshift releases into my local container registry to facilitate cluster updates. Maybe I even have a local cincinnati deployed. With every cluster update, must I also now download the latest boot image and place it on my local http mirror?

(I assume that somewhere else we'll solve the problem of how the theoretical automation looking at the configmap and updating MachineSets knows about my local OS image mirror and does some sort of translation, similar to how ImageContentSourcePolicies work.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is covered by the "Non-goal: Replacing the default in-place update path". So yes, we can't require it anytime soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Reusing this sub-thread to just say thanks @mhrivnak for the thoughtful review!)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhrivnak but it can be an optimization. I guess we can start relying on it for ZTP RWN flow. @avishayt @ronniel1 @celebdor @tsorya WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romfreiman if by "optimization" you mean we might be able to skip the early pivot by having the latest boot ISO, I'm not sure that will work out very often. @cgwalters:

  1. Is it ever possible that the early pivot can be skipped because the boot image has the same version as the machine-os-content container?
  2. If "yes", any thoughts on how often that might be the case?

This is just in the context of bare metal where an extra reboot costs up to 15 minutes, plus however long it takes to retrieve and apply the machine-os-content update. So it's desirable to write the correct RHCOS version to disk the first time if possible.

enhancements/coreos-bootimages.md Outdated Show resolved Hide resolved
@cgwalters
Copy link
Member Author

OK I updated this one to call for binary patching the installer, the same way we do the release image, instead of duplicating the bootimages there. That gets openshift/installer entirely out of the rhcos.json business which I think will be a win for everyone.

metadata, change `oc adm release new` to *binary patch* the included `openshift-install`
binary, in the same way we [do for the release image](https://github.com/openshift/installer/pull/1422).

The RHCOS team will take over maintenance, CI, and automation of `rhel-coreos-bootimages`. For OKD,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RHCOS team will take over maintenance, CI, and automation of rhel-coreos-bootimages.

What does this entail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To start, we would do PRs to that repository to bump it in the same way we do openshift/installer. See the prototype at https://github.com/cgwalters/rhel-coreos-bootimages

But we could switch to having a bot do PRs after new cosa builds happen, or something else.


The installer needs CoreOS to start at all; today the bootimages are used for the bootstrap node
and control plane which are created by the installer directly. In order to avoid duplicating the
metadata, change `oc adm release new` to *binary patch* the included `openshift-install`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the binary patch inject the entire stream metadata into the installer? What is the process that a developer (or a libvirt user) running a locally built installer would use? Would they be expected to do a binary patch as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today the installer when built from git master defaults to an OKD release image with rhcos.json, which just doesn't work at all - this is a common stumbling block for people.

I think probably the simplest thing here is that when building from git branch master, we default to using oc to binary patch to the latest release image data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the statement that it "just doesn't work at all". I frequently do successful installation using the installer built from git master without needing to make any changes for the RHCOS image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you run your locally-built installer, do you provide env OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE?

What I and others have hit is pretty simple:

The default release image is OKD:

$ oc image info registry.ci.openshift.org/origin/release:4.8 | grep -i release=
Labels:      io.openshift.release=4.8.0-0.okd-2021-03-07-094455
$

Yet the default pinned bootimages are RHCOS. That means e.g. the bootstrap node is RHCOS yet we're pulling down the OKD release image which causes various problems, at least in the past.

We added support to streamline overriding the release image to xokdinst to help with this for example.


This command will extract the CoreOS bootimage stream metadata from a release image and should be used by UPI installs (including our many existing automated tests for UPI).

Additionally, we will add `oc adm release coreos-download -p openstack` which will e.g. download the OpenStack `.qcow2`, including verifying its integrity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the platform used here going to match the platforms used in the metadata stream or the platform used by OpenShift? For example, if a user is installing on RHV, would the user use the openstack platform or the ovirt platform for this command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The most user-friendly thing is probably to accept OpenShift platforms and map them to stream metadata, so -p rhev or -p ovirt maps to openstack as it does in IPI. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me from a user's standpoint. Will the RHCOS team own that mapping as the installer adds more platforms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would end up being a small Go library shared between oc and openshift-install? Or we could glom it onto an existing library like...hmm, library-go? Not sure.

Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane to me. Will help us on several fronts.


A new git repository https://github.com/openshift/rhel-coreos-bootimages will be created (based on [an existing prototype repository](https://github.com/cgwalters/rhel-coreos-bootimages)). It will have the stream JSON.

Additionally, this repository will be included in the release image as `rhel-coreos-bootimages` and it will use the CVO to install a `configmap/coreos-bootimages` in the `openshift-machine-config-operator` namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming wise you should be machine-os-bootimage-config (and i think your repo should be that name too)

The installer needs CoreOS to start at all; today the bootimages are used for the bootstrap node
and control plane which are created by the installer directly. In order to avoid duplicating the
metadata, change `oc adm release new` to *binary patch* the included `openshift-install`
binary, in the same way we [do for the release image](https://github.com/openshift/installer/pull/1422).
Copy link
Contributor

@smarterclayton smarterclayton Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. Patching the installer (like we patch the CLI) is something that is a requirement in our process already, so we wouldn't be worsening it. It's a bit wierd because now to extract the installer we have to extract something from an image BEFORE we extract the installer. The JSON blob here is also significantly larger than a version string, so there is some ugliness there (the way we do patching has some limitations, but probably wouldn't block this change).

One alternative is having the installer fetch the metadata the same way the oc tools do, reusing our libaries (which is no less safe, but does change a bit of the installer flow). That has some advantages because then the installer could verify other properties of the install including whether the credentials the user provides have access to the images, but which would not help when installing into restricted environments where the installer doesn't have access to the things that are pulled at all. I think given that I'd probably rule out this option.

A second alternative is to make the user fetch this data themselves and provide it to the installer. This breaks the hermetic quality of the release, and loses the integrity check we get when we install from the installer (right now the shasum of the installer is enough to verify the entire install chain). Because of the loss of integrity, I'd rule out this alternative as well.

A variant of your original option is to have the metadata about where those come from be part of machine-os-content, either as labels on the image, or a file in machine-os-content, which we extract and place in the payload directly. However, the implication I think from the team is that we would then need to know the final publish locations before building machine-os-content? That might not be insurmountable as a label, but might be higher cost to the release pipeline, and so a separate image is maybe ok.

The last alternative is to keep this in the installer, but to have the installer inject the content as a config map on the cluster. I think of all the options I prefer that the most because it requires no additional change to our release tooling, and requires no additional repo, and since this is purely metadata, I'd prefer the data used by the installer to be colocated with the installer binary.

I'd like to hear arguments against my last alternative, because I prefer it over what is described in this enhancement (maybe I missed a reason why we can't do that)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variant of your original option is to have the metadata about where those come from be part of machine-os-content, either as labels on the image, or a file in machine-os-content, which we extract and place in the payload directly. However, the implication I think from the team is that we would then need to know the final publish locations before building machine-os-content?

Yes, there are very messy aspects around the buildsystem to include this in machine-os-content. But actually another problem is that any time we change machine-os-content, that forces clusters to update in-place. I recently went for example to do a PR to a release branch that added some extra commentary into a systemd unit we ship, but I stopped when I realized it'd pointlessly force a reboot for customers.

A more recent example is that we have a bug in LUKS for 4.7 - this only requires a bootimage update, there's nothing to change for running nodes.

A pathological case here is when we e.g. just have a fix for vSphere and just want to bump that bootimage.

And more generally I want to try harder to (re)split the bootimage-vs-machine-os-content in our build system, partly for this reason. The other reason to do the split is we have the converse problem where e.g. we just want to get a kernel/crio/kubelet fix out and that doesn't need new bootimages, but we still make new AMIs for every new kubelet, which is a pointless waste.

Copy link
Member Author

@cgwalters cgwalters Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, there's definitely things we could do to mitigate the "every machine-os-content change rolls the whole cluster for everyone" issue; I'd eventually like to include some sort of metadata around update urgency. There's a huge difference between a kernel CVE update and my new commentary in a systemd unit file.

And we could probably teach the MCO to be more intelligent about this - e.g. right now it just compares image sha256, but we could go back to trying to have it pull the inner ostree out of the image and skip an update if that hasn't changed. (Well, that plus extensions). It'd mean we'd still end up causing existing clusters to pull the m-o-c to at least one node, and then skip doing anything - wouldn't be too bad. But it starts to grow nontrivial logic in the MCO and we're near capacity there for nontrivial changes.

Copy link
Member Author

@cgwalters cgwalters Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A somewhat less good (but quite important to us tactically right now) reason is we're in the midst of a transition of trying to have our CI and build be more Prow native, - but getting machine-os-content to work that way all at once is unlikely to happen for 4.8. But with a new component we can add this new git repository now that would have (just like openshift/installer does) Prow jobs that gate it and we'd suddenly gain for free a ton of OpenShift CI integration that doesn't exist today in our builds until we go to do a PR to openshift/installer.

Copy link
Member Author

@cgwalters cgwalters Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last alternative is to keep this in the installer, but to have the installer inject the content as a config map on the cluster. I think of all the options I prefer that the most because it requires no additional change to our release tooling, and requires no additional repo, and since this is purely metadata, I'd prefer the data used by the installer to be colocated with the installer binary.

Oh sorry I missed the nuance of this and spent a lot of time replying to the m-o-c bit but here's what you were really arguing! Sorry.

OK yes, in fact that's exactly what I was working on in openshift/installer#4582 but the reason I pivoted to this is because it gets messy to go from there to solving the "how do UPI installs consume this" problem.
And it turns out there are a lot of "automated UPI testing" jobs out there.

Would we add something like openshift-install print-coreos-stream-metadata that dumps the JSON?

Should UPI (including our automated UPI tests) fetch the metadata from https://mirror.openshift.com instead?

We're kind of here because openshift/installer#2092 was rejected and the installer team lead at the time was strongly of the opinion this should be part of the release image and oc.

Copy link
Contributor

@sinnykumari sinnykumari Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MCO, m-o-c is the source of truth today. About teaching MCO to skip non-trivial OS update, I think one way would be to add a label (like update:true) to m-o-c container image that MCD reads and considers as no-op. Thinking about this more, my idea is not really going to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clayton and I had a fruitful discussion around his last alternative. I am in agreement with him in preference of that alternative.

A configmap containing the metadata stream would be added as a manifest in the installer image. This configmap would be injected into the cluster in the normal way for image manifests during installs and upgrades.

There would be a command added to openshift-install that a user could run to get the stream metadata, for use cases such as UPI installs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also of note in the details of the last alternative is that the installer would include the stream metadata configmap in its binary so that the stream metadata used at install-time matches what is included in the release image.

@cgwalters
Copy link
Member Author

One angle we could actually take this is to decide that this new git repository will actually become an operator itself in the future, based on the code from https://gitlab.com/cgwalters/openshift-update-bootimages

I don't think we should scope in trying to determine its API right now - perhaps it wouldn't even have an API, it would just by default in IPI scenarios detect the case when a cluster is upgraded, it'd automatically update machinesets to the latest bootimage for workers. Then all we'd need is the MCO firstboot to gain some logic for "do I actually need to OS upgrade and reboot" and we'd gain the scaleup benefits.


### Goal 1: Including metadata in the cluster so that we can later write automated IPI/machineAPI updates

Lay the groundwork so that one or both of the MCO or machineAPI operators can act on this data, and e.g. in an AWS cluster update the machineset to use a new base AMI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to understand what sort of changes MCO would need to do? bootimages today are ideally handled by machine-api, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This topic bridges the two I think. Indeed the machineAPI naturally would be a good place to update machinesets. But on the flip side, this stuff is completely CoreOS specific (like a lot of the MCO), whereas the machineAPI is a lot more generic (because it's forked from the upstream generic Kube cluster API).

Also if the goal is to use this to skip the firstboot os upgrade, we would want some careful coordination around analyzing he diff between machine-os-content and the bootimages to ensure we don't actually need to do the in-place update. IOW there's no point to doing the machineset update if we still need to firstboot reboot - so that's MCO knowledge.

Or as I commented below, perhaps this is even a new (small) operator. I don't have a really strong opinion to be clear. This enhancement just calls for injecting the data so we can later do that work.

(But, the configmap is placed in the openshift-machine-config-operator namespace; if you guys object to that or want it changed please say so!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if the goal is to use this to skip the firstboot os upgrade, we would want some careful coordination around analyzing he diff between machine-os-content and the bootimages to ensure we don't actually need to do the in-place update. IOW there's no point to doing the machineset update if we still need to firstboot reboot - so that's MCO knowledge.

agree with making MCO smarter here to look at OSTree content rather than just checking the OSImageURL.

This enhancement just calls for injecting the data so we can later do that work.

right

(But, the configmap is placed in the openshift-machine-config-operator namespace; if you guys object to that or want it changed please say so!)

either machineAPI or MCO namespace sounds reasonable place for this. Based on your reasoning about MCO over machineAPI, I don't see any issue here.

@cgwalters
Copy link
Member Author

cgwalters commented Mar 11, 2021

OK there is a smaller fallback plan which is basically:

Keep going with openshift/installer#4582 except also add code to "back-convert" the stream metadata to rhcos.json in exactly the same place and format for now. This effectively decouples the "get metadata in cluster" and "port all the UPI flow" problems.

EDIT: An even quicker hack is to simply duplicate the metadata. That's what openshift/installer#4582 is doing as of openshift/installer@c644fe7

This is a preparatory subset of the larger enhancement for [in-cluster CoreOS bootimage creation](openshift#201).

This enhancement calls for a standardized JSON format for (RHEL) CoreOS bootimage metadata to be placed at https://mirror.openshift.com *and* included in a new `rhel-coreos-bootimages` image included in the release image.
@cgwalters cgwalters force-pushed the coreos-bootimages-prep branch from 0aad02f to 15c7112 Compare March 16, 2021 19:32
@cgwalters
Copy link
Member Author

OK I reworked this to reflect the current state of openshift/installer#4760 (and its dependency openshift/installer#4582 ) - basically we continue to have the installer be "source of truth" and have it install the configmap rather than introducing a new git repository + binary patching.

We also agreed to add a new CLI command to openshift-installer which unblocks porting UPI flows etc.


See above - this JSON file will be available in multiple ways for UPI installations.

### Goal 3: Provide a standardized JSON file for bare metal IPI and baremetal devscripts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for the assisted installer.

@romfreiman
Copy link

@cgwalters WDYT about adding similar command to oc adm? So the user can have a single flow for mirroring?

  • oc adm release mirror
  • oc adm release print-stream-json or whatever?
    And later on also to add mirror command. I think it reduces the requirement of extracting the installer, and then use it as a 2nd step.
    Lets not block this enhancement on this though :)

@cgwalters
Copy link
Member Author

https://docs.fedoraproject.org/en-US/fedora-coreos/stream-metadata/ is for FCOS but will hopefully be a useful reference for RHCOS too; in particular it contains a link to https://github.com/coreos/stream-metadata-go for example.

@cgwalters
Copy link
Member Author

oc adm release mirror

Definitely I think we should leverage this to improve the mirroring case even more. My view is that openshift-install coreos print-stream-json is the raw enablement that allows us to write higher level tooling.

Having higher level tooling in oc gets slightly awkward here because now we indeed to have to indirectly scrape out the openshift-installer binary from the release image.

I think this leads to having the higher level sugar be e.g. openshift-install coreos download -p openstack for example which would write the rhcos-$version-openstack.qcow2 to the current directory.

But that said, once we have that we could add another level like oc adm release coreos-download -p openstack which would extract the openshift-install binary and run it.

@cgwalters
Copy link
Member Author

@staebler mind adding an approve/lgtm for this since it's now almost entirely an enhancement for openshift/installer?

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 22, 2021
@staebler
Copy link
Contributor

@staebler mind adding an approve/lgtm for this since it's now almost entirely an enhancement for openshift/installer?

@cgwalters I put my lgtm, but I do not have approval rights. You'll have to find someone else for that.

@smarterclayton
Copy link
Contributor

/approve

Thanks for getting this sorted.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, staebler, travier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2021
@openshift-merge-robot openshift-merge-robot merged commit c8cc94e into openshift:master Mar 23, 2021
@cgwalters
Copy link
Member Author

OK there's some active debate on this happening in private chat, trying to make this public here:

One thread I think we did lose here is having something like oc adm release coreos print-stream-json - i.e. supporting oc as the frontend to this data, in addition to openshift-installer. I think the pivotal aspect here is for disconnected installs, it makes total sense to use oc - the admin already had to mirror the release image. For e.g. AWS UPI installs, the admin just wants to script finding the AMI; they have to use openshift-installer for other things like the Ignition, but otherwise don't need oc (I think). So the UPI path is IMO going to be more ergonomic with openshift-install.

But either way here's a strawman:

oc adm release coreos print-stream-json registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-03-26-054333
or so.

@cgwalters
Copy link
Member Author

Also related to disconnected installs - I don't have much experience with Hive but I would assume it's pretty oriented towards the release image and extracts the installer binary from it? In that case then indeed exposing an oc CLI for this would be better for Hive - is that right?

@mhrivnak
Copy link
Member

I think the pivotal aspect here is for disconnected installs, it makes total sense to use oc - the admin already had to mirror the release image.

Agreed. The admin also needs to mirror an ISO, so it makes sense to use the same tool to determine which ISO to mirror.

it's pretty oriented towards the release image and extracts the installer binary from it? In that case then indeed exposing an oc CLI for this would be better for Hive - is that right?

@dgoodwin how does hive extract the installer in disconnected environments? For a similar agent-based use case, we're waiting on this enhancement to land to enable us to extract the appropriate installer binary.

@dgoodwin
Copy link
Contributor

I think the pivotal aspect here is for disconnected installs, it makes total sense to use oc - the admin already had to mirror the release image.

Agreed. The admin also needs to mirror an ISO, so it makes sense to use the same tool to determine which ISO to mirror.

it's pretty oriented towards the release image and extracts the installer binary from it? In that case then indeed exposing an oc CLI for this would be better for Hive - is that right?

@dgoodwin how does hive extract the installer in disconnected environments? For a similar agent-based use case, we're waiting on this enhancement to land to enable us to extract the appropriate installer binary.

I believe we run a pod using the release image and a custom command / sidecar container which extracts the installer image ref from the /release-manifests/image-references.

We then launch the installer pod using some of those image refs. (installer/cli/etc)

@cgwalters
Copy link
Member Author

Though will Hive ever care about the bootimage? All Hive is IPI right? So having an interface for this in openshift-install vs oc may not actually be relevant to Hive?

@dgoodwin
Copy link
Contributor

Bootimage aka RHCOS image? We are integrating with the Agent/Assisted installer so I could easily see a world where we also extract the RHCOS image to provide to the Agent installer.

@mhrivnak
Copy link
Member

Today a disconnected hive user has to:

  1. read documentation to figure out where the RHCOS images live
  2. go there and download one
  3. make it available via http inside their network
  4. add the resulting internal URL to the InstallConfig that they provide to hive.

Since the RHCOS image is not included as part of the payload (and isn't planned to be AFAIK), there's not much hive can do to simplify the above, unless we add a new feature that tracks a collection of available RHCOS boot images and tries to match the best one with a particular release image at install time.

@cgwalters
Copy link
Member Author

OK then right, indeed you could use it. But, would having the data in oc vs openshift-install be better? Does it matter?

@dgoodwin
Copy link
Contributor

Release image would be the best fit for how we do things today, oc and openshift-install may be workable but will take some effort.

@cgwalters
Copy link
Member Author

Reusing this as a handy place to answer a semi-FAQ:

Q: "Will this allow skipping the initial os update + reboot"?
A: Not by default, and not at all until the MCO is patched.

There's a whole chain of stuff that needs to happen for that; first we need to productize https://gitlab.com/cgwalters/openshift-update-bootimages and pick a place to ship it.

Then, the next problem is that the RHCOS build pipeline isn't producing new bootimages for every update; when a new kernel CVE comes out we don't want to require regenerating everything - we just ship a new oscontainer. So it's going to happen by default that e.g. the AMIs in the metadata are old.

Now, we could change that, but when we do that we run into a much deeper problem and this is where #201 is - if we do that it won't apply to on-premise installs (OpenStack, vSphere, bare metal) until we land code somewhere that contains code to interact with the underlying infra to update the bootimage. And doing this by default in public cloud but not on premise risks creating a large feature divergence.

cgwalters added a commit to cgwalters/installer that referenced this pull request Oct 21, 2021
In openshift/enhancements#679
we landed support for a stream metadata format already used
by FCOS and now by RHCOS/OCP consumers to find the bootimages.

We kept the old metadata because the UPI CI jobs used it.
In openshift/release#17482 I tried
to port the UPI CI jobs, but ran into huge levels of technical debt.

As best I can tell, the UPI CI jobs are not running on this repo
now and are likely broken for other reasons.

Let's remove the old data and avoid the confusing duplication.
Anyone who goes to work on the UPI CI jobs and sanitizes things
should be able to pick up the work to port to stream metadata
relatively easily.
clnperez pushed a commit to clnperez/installer that referenced this pull request Nov 2, 2021
In openshift/enhancements#679
we landed support for a stream metadata format already used
by FCOS and now by RHCOS/OCP consumers to find the bootimages.

We kept the old metadata because the UPI CI jobs used it.
In openshift/release#17482 I tried
to port the UPI CI jobs, but ran into huge levels of technical debt.

As best I can tell, the UPI CI jobs are not running on this repo
now and are likely broken for other reasons.

Let's remove the old data and avoid the confusing duplication.
Anyone who goes to work on the UPI CI jobs and sanitizes things
should be able to pick up the work to port to stream metadata
relatively easily.
jstuever pushed a commit to jstuever/release that referenced this pull request Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.