-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coreos-bootimage-streams: Standardized CoreOS bootimage metadata #679
Merged
openshift-merge-robot
merged 1 commit into
openshift:master
from
cgwalters:coreos-bootimages-prep
Mar 23, 2021
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
--- | ||
title: coreos-bootimage-streams | ||
authors: | ||
- "@cgwalters" | ||
reviewers: | ||
- "@coreos-team" | ||
approvers: | ||
- "@coreos-team" | ||
creation-date: 2021-03-04 | ||
last-updated: 2021-03-04 | ||
status: provisional | ||
--- | ||
|
||
# Standardized CoreOS bootimage metadata | ||
|
||
This is a preparatory subset of the larger enhancement for [in-cluster CoreOS bootimage creation](https://github.com/openshift/enhancements/pull/201). | ||
|
||
This enhancement calls for a standardized JSON format for (RHEL) CoreOS bootimage metadata to be available via 3 distinct mechanisms: | ||
|
||
- In cluster as a ConfigMap: `oc -n openshift-machine-config-operator get configmap/coreos-bootimages` | ||
- Via `openshift-install coreos print-stream-json` | ||
- At https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos-4.8.json | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
Since the initial release of OpenShift 4, we have "pinned" RHCOS bootimage metadata inside [openshift/installer](https://github.com/openshift/installer). | ||
In combination with the binding between the installer and release image, this means that everything needed to install OpenShift (including the operating system "bootimages" such as e.g. AMIs and OpenStack `.qcow2` files) are all captured behind the release image which we can test and ship as an atomic unit. | ||
|
||
We have a mechanism to do [in place updates](https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md), but there is no automated mechanism to update "bootimages" past a cluster installation. | ||
|
||
This enhancement does not describe an automated mechanism to do this: the initial goal is to include this metadata in a standardized format in the cluster and at mirror.openshift.com so that UPI installations can do this manually, and we can start work on an IPI mechanism. | ||
|
||
### Stream metadata format | ||
|
||
As part of unifying Fedora CoreOS and RHEL CoreOS, we have standardized on the "stream metadata" format used by FCOS. More in [FCOS docs](https://docs.fedoraproject.org/en-US/fedora-coreos/getting-started/) and [this RHCOS issue](https://github.com/openshift/os/issues/477). | ||
|
||
There is a new [stream-metadata-go](https://github.com/coreos/stream-metadata-go) library to consume this data. | ||
|
||
### Add stream metadata to openshift/installer | ||
|
||
We will continue to have the openshift/installer git repository be the source of truth for pinned RHCOS boot images. | ||
|
||
However, we will convert the data there to stream metadata JSON, and port the IPI installer flow to use that. | ||
|
||
### Add openshift-install coreos print-stream-json | ||
|
||
A new command to simply dump this JSON can be used by UPI installs. | ||
|
||
### Update the `installer` container image to inject a configmap | ||
|
||
In order to work on automated in-place bootimage updates, the data needs to be lifecycled | ||
with the cluster release image. There is already an `installer` image as part of the | ||
release payload that just contains the installer binary today. This enhancement | ||
calls for adding `manifests/` directory to that and having the CVO treat it as a minimal | ||
"operator" that just updates the configmap. | ||
|
||
This enhancement calls for installing this configmap into the `openshift-machine-config-operator` namespace; | ||
logically the functionality is split between machineAPI and MCO, but it doesn't ultimately matter | ||
which namespace has the configmap. | ||
|
||
|
||
### Data available at https://mirror.openshift.com | ||
|
||
The way we deliver bootimages at http://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.7/latest/ is not friendly to automation. By placing the stream metadata JSON there, we gain a standardized machine-readable format. | ||
|
||
The ART team will synchronize the stream metadata with the data embedded in the latest release image for a particular OpenShift minor. | ||
|
||
|
||
## Motivation | ||
|
||
### Goal 1: Including metadata in the cluster so that we can later write automated IPI/machineAPI updates | ||
|
||
Lay the groundwork so that one or both of the MCO or machineAPI operators can act on this data, and e.g. in an AWS cluster update the machineset to use a new base AMI. | ||
|
||
### Goal 2: Provide a standardized JSON file UPI installs | ||
cgwalters marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
See above - this JSON file will be available in multiple ways for UPI installations. | ||
|
||
### Goal 3: Provide a standardized JSON file for bare metal IPI and baremetal devscripts | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same goes for the assisted installer. |
||
|
||
Bare metal IPI and [baremetal devscripts](https://github.com/openshift-metal3/dev-scripts/blob/7e4800462fa7e71aaa9e4a7f4eb10166a6b1789c/rhcos.sh#L14) also | ||
parse the existing `rhcos.json`; it will be significantly better for them to use the | ||
`openshift-install coreos print-stream-json` command. | ||
### Non-Goals | ||
|
||
#### Replacing the default in-place update path | ||
|
||
In-place updates as [managed by the MCO](https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md) today works fairly seamlessly. | ||
We can't require that everyone fully reprovision a machine in order to do in-place updates - that makes updates *much* more expensive, particularly on bare metal environments. | ||
It implies re-downloading all container images, etc. | ||
|
||
Today in OpenShift 4, the control plane is also somewhat of a "pet" - we don't have streamlined support for reprovisioning control plane nodes even in IaaS/cloud and hence must continue to do in-place updates. | ||
|
||
At some point, along with the [larger in-cluster bootimage generation enhancement](https://github.com/openshift/enhancements/pull/201) we hope to streamline bootimage updates sufficiently that at some point we could *require* | ||
newly scaled up workers to use them. But this enhancement will not add any such requirement. | ||
|
||
### User Stories | ||
|
||
#### Story 1 | ||
|
||
An OpenShift core developer can write a PR which reads the configmap from the cluster and acts on it to update the machinesets to e.g. use a new AMI. | ||
|
||
We can start on other nuanced problems like ensuring we only update the machinesets once a controlplane update is complete, or potentially even offering an option in IPI/machineAPI installs to drain and replace workers instead of doing in-place updates. | ||
|
||
#### Story 2 | ||
|
||
ACME Corp runs OpenShift 4 on vSphere in an on-premise environment not connected to the public Internet. They have (traditional) RHEL 7 already imported into the environment and already pre-configured and managed by the operations team. | ||
|
||
The administrator boots an instance there, logs in via ssh, downloads an `oc` binary. They proceed to follow the instructions for preparing a [mirror registry](https://docs.openshift.com/container-platform/4.7/installing/install_config/installing-restricted-networks-preparations.html). | ||
|
||
The administrator also uses `openshift-install coreos print-stream-json` and writes a script to parse the JSON to find the vSphere OVA and download it. Then the administrator uploads it to the vSphere instance. | ||
|
||
From that point, the operations team can use `openshift-install` in UPI mode, referencing that already uploaded bootimage and the internally mirrored OpenShift release image content. | ||
|
||
### Risks and Mitigations | ||
|
||
We may discover even more things depend on `rhcos.json` inside the installer. We may have to fall back to continuing to maintain a copy in the installer git (in the old format) for a cycle. | ||
|
||
## Design Details | ||
|
||
### Test Plan | ||
|
||
This will be well covered by existing CI flows for IPI. For UPI, it will become trickier because the jobs will need to become more OpenShift version dependent. | ||
|
||
### Graduation Criteria | ||
|
||
TBD | ||
|
||
### Version Skew Strategy | ||
|
||
We already have to deal with problems of skew in UPI installs in particular - things like administrators trying to use a 4.2 vSphere OVA to install 4.7, etc. This standardizes an API for the future around discovering and maintaining these images. | ||
|
||
## Implementation History | ||
|
||
## Drawbacks | ||
|
||
## Alternatives | ||
|
||
None, we need to do something here. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to understand what sort of changes MCO would need to do? bootimages today are ideally handled by machine-api, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This topic bridges the two I think. Indeed the machineAPI naturally would be a good place to update machinesets. But on the flip side, this stuff is completely CoreOS specific (like a lot of the MCO), whereas the machineAPI is a lot more generic (because it's forked from the upstream generic Kube cluster API).
Also if the goal is to use this to skip the firstboot os upgrade, we would want some careful coordination around analyzing he diff between machine-os-content and the bootimages to ensure we don't actually need to do the in-place update. IOW there's no point to doing the machineset update if we still need to firstboot reboot - so that's MCO knowledge.
Or as I commented below, perhaps this is even a new (small) operator. I don't have a really strong opinion to be clear. This enhancement just calls for injecting the data so we can later do that work.
(But, the configmap is placed in the
openshift-machine-config-operator
namespace; if you guys object to that or want it changed please say so!)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with making MCO smarter here to look at OSTree content rather than just checking the OSImageURL.
right
either machineAPI or MCO namespace sounds reasonable place for this. Based on your reasoning about MCO over machineAPI, I don't see any issue here.