-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement: add ignition dual spec 2/3 support #97
enhancement: add ignition dual spec 2/3 support #97
Conversation
Acceptance criteria: | ||
- Essentially the same as story 1 | ||
|
||
** As a user of Openshift, I’d like to install a fresh ignition spec 3 cluster ** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the users just want to install a cluster, imo there is less value in customers asking for spec2 vs spec3
above we mention that the installer can switch to spec3 completely, do we have some idea how we can make sure UPI customers are using the correct boot-images when they get the spec3 ignition configs but maybe try to use a spec3 supporting bootimage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the workflow should look the same for IPI
do we have some idea how we can make sure UPI customers are using the correct boot-images when they get the spec3 ignition configs but maybe try to use a spec3 supporting bootimage
We'll definitely have to communicate in the docs which preview version has spec 3 support. The installer maybe should add capability to detect if the user is still on an old version and reject the install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installer maybe should add capability to detect if the user is still on an old version and reject the install.
at what stage do you think the installer will be able to do that?
in UPI, the installer only outputs ignition files and rest is user controlled process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh true, I guess in this case the best we can do is to alert the user when they use a new version of the installer that this build will output spec 3 ignition configs, which is only supported by RHCOS versions x and later?
This goes back to openshift/installer#1399 which might be a better UPI workflow. Alternatively, for example if we decide to do this in 4.5, all 4.5 images + installer generated configs are spec 3 by default only. In that case a user would only run in to a version mistmatch if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to suggestions of course since UPI in general will be harder to coordinate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes back to openshift/installer#1399 which might be a better UPI workflow. Alternatively, for example if we decide to do this in 4.5, all 4.5 images + installer generated configs are spec 3 by default only. In that case a user would only run in to a version mistmatch if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.
if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.
that means that we don't have N-1 compatibility, which is not we tell our users! the n-1 should be fine as long as you don't depend on a feature that is in N.
This goes back to openshift/installer#1399 which might be a better UPI workflow.
I have specifically pushed back on that idea because, that allows us to demand this 4.4 works with 4.4 bootimage only, when that shouldn't be the case for most users. and also something we need to support in our clusters because 4.1 created clusters have new nodes join in at 4.1 and then pivot to say 4.4.
Open to suggestions of course since UPI in general will be harder to coordinate
+1
as long as we explicitly mention what we are doing for this case, i'm fine.
The translation will happen when the version of MCO with dual support and | ||
translator is first deployed; it will detect the existing config being spec 2, | ||
generate a new renderedconfig based on a translator from spec2 to spec 3, and | ||
do a “dummy reboot” (if necessary) to bring the cluster to the desired config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly do you mean by "dummy reboot" in this scenario? When would the "dummy reboot" be necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the cluster machineconfig is saved as a "desiredConfig" hash, I'm not sure the best way to translate that yet. This just means that after the specs are translated, if we need, we can set a new desiredConfig that specifies a successful translation to spec 3, which the cluster needs to reboot to bring the currentConfig up to.
Acceptance criteria: | ||
- The machineconfig is applied successfully, if the user has defined a correct spec 3 ignition snippet | ||
- The user is properly alerted if they attempt to apply a spec 2 config, and the machineconfig fails to apply | ||
- The user is given necessary docs to remove the undesired spec 2 config and to translate it to a spec 3 config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "the undesired spec 2 config" in this case? The last bullet says any spec 2 config would fail to apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it just means any spec 2 config in the system. Basically steps to oc delete machineconfig/xxx
and then apply a new one
Many existing tests will also have to be updated given the spec change. | ||
|
||
|
||
### Upgrade / Downgrade Strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downgrades from spec 3 to spec 2 are off the table, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We should never have to do that. We do however probably need rollback functionality to recover from bad machineconfig states.
- The user should have received notification that the update will be changing spec version, as well as received necessary documentation on how to recover a failed update | ||
- CI tests are put in place to make sure the existing versions can be updated to the new payload | ||
|
||
** As a user of openshift, I’d like to install from a spec 2 bootimage andimmediately update to a spec 3 payload ** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "andimmediately" -> "and immediately"
|
||
** How to properly detect which ignition version to serve new machines ** | ||
|
||
This will likely be somewhat difficult to handle, as the wrong version on a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we teach Ignition to request a version using Accept
headers? Looks like it may already do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And with the machine-config server switching on application/vnd.coreos.ignition+json;version=...
and an ability to translate spec-2 configs to spec-3 and spec-3 configs to spec-2, we'd be able to run a full 4.4.z (or whatever z stream) supporting both configs, and warning about 4.5 (or whatever) incompatibility if we saw any spec-2 requests come in.
cf9126c
to
3d6193d
Compare
Updated per discussions with @crawford, PTAL. |
The MCS will host both spec 3 and spec 2 configs, with a functionality to | ||
translate spec 3 down to spec 2 configs, and spec 2 up to spec 3 configs. | ||
The MCS will first check which ignition spec version a new node supports, | ||
before serving a config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there already something in place that we can leverage? or do we need to check with MAO (or something else?) to provide us with something like that
|
||
## Proposal | ||
|
||
This change is multi-component: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericavonb brought up that we might need to sync/communicate with KNI as well (iirc)
See coreos/ignition#889 |
- [ ] Docs are updated to reflect the new config version | ||
|
||
#### Phase 2: | ||
- [ ] RHCOS bootimage is updated to accept ignition spec 3 configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec v3 only
3d6193d
to
9152ff4
Compare
Updated to reflect the new phase 1/2 plans. Specifically, our main objective in phase 1 becomes adding the ability to apply spec 3 machineconfigs. Does not involve installer/RHCOS. Also note the new |
- [ ] MCD gaining the ability to process both spec 2 and spec 3 configs | ||
- [ ] Create tests to apply spec 3 configs to spec 2 clusters successfully | ||
- [ ] An alerting mechanism is put in place for outdated and incompatible/non-translatable configs | ||
- [ ] Docs are updated to reflect the new config version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crawford can you ack this plan if it does look good to you? Not sure who else to bring in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignition is always capitalized.
Why aren’t we using spec 3 on new clusters in phase 1?
This enhancement proposal aims to add dual ignition specification version 2/3 | ||
(ignition version 0/2) support to Openshift 4.x, which currently only support | ||
ignition version 0 spec 2 for OS provisioning and machine config updates. We | ||
aim to introduce a non-breaking method to switch all new and existing clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t non-breaking, it’s just got a grace period.
The objective of this phase is to allow users to apply machineconfigs with | ||
ignition spec 3 support to new and existing clusters. The breakdown is: | ||
|
||
- a tool is created to translate between ignition spec versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCO makes use directly of ignition typing via vendoring (currently 2.2), so we'd need a method to translate the typing on the system from 2->3, unless you think the MCO should be on 2 indefinitely for some reason
- RHCOS bootimages switches to only accept ignition spec 3 configs | ||
- The Openshift installer is updated to generate spec 3 configs | ||
- Remaining MC* components generate spec 3 only | ||
- MCO enforces that all configs are spec 3 before allowing the CVO to start the update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How’s this going to work? The MCO isn’t going to be able to distinguish between a z-stream update and a y-stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has not been fleshed out yet. I recall discussion regarding using unupgradable flag for this
requirement for security/compliance purposes in OCP. The existing version on | ||
RHCOS systems (ignition version v0.33) carries a spec version (spec version 2, | ||
henceforth “Spec 2”) that is not compatible with Spec 3. Thus we would like to | ||
update the ignition version on RHCOS/Installer/MCO to make use of the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we record the rational for this approach vs teaching Ignition/RHCOS how to handle both spec variants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only ignition (the package) is updated to handle both, all MachineConfigs applied after firstboot will not be affected by the change. The MCO does e.g. file writing itself based on machineconfigs, which use ignition typing.
- [ ] A config translator is created to translate from ignition spec 3 to spec 2 | ||
- [ ] The MCO is updated to support both ignition spec 2 and 3, with: | ||
- [ ] MCC rendering spec 3 configs to spec 2 | ||
- [ ] MCD gaining the ability to process both spec 2 and spec 3 configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the rendered configs are spec 2, do we need to bother with this in phase 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not, perhaps I phrased this poorly, will revise
- Those that we’re not 100% sure we can directly translate, but we can infer what the user is doing and do a translation | ||
- Those that we’re 100% sure we CANNOT translate directly, and requires user input for us to correctly translate | ||
|
||
During phase 1 we should attempt to translate on a best-effort basis. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clarify that we will only perform the translation of each config if we are certain that it will be successful. “Best-effort” here refers to the act of visiting all configs, but not the translation for each config.
when the MCO with dual support is first deployed onto the cluster). This will | ||
only be required as part of the MCO. | ||
|
||
Note that there exists three types of spec 2 configs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the purpose of this document, there are two types: those that we are certain we can translate and everything else. We aren’t going to handle the uncertain and certain-failure cases differently (are we?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
the user that there are untranslatable configs. | ||
|
||
During phase two, we should fail updates unless the cluster is fully on spec | ||
3 config. This effectively means that UPI clusters are at risk when upgrading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are UPI clusters at risk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More likely the user supplied their own ignition config customizations
|
||
The bootstrap ignition configs are updated to be spec 3. The stub ignition | ||
configs for master/worker nodes are updated to spec 3, and referece a different | ||
endpoint of the MCS which will serve ignition spec 3 configs. RHCOS images pinned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endpoint should be the same. Ignition already uses an Accept header to specify which spec versions it can process.
The MCO will flat out reject spec 2 configs, and refuse to upgrade clusters | ||
that have spec 2 bits. | ||
|
||
The MCO will also throw an alert upon: seeing a spec 2 machineconfig applied to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we want to bother with alerts here. We can just validate the MachineConfig before the CR is committed.
9152ff4
to
37bd8f5
Compare
Added some fixes.
This will require non-trivial changes to the installer and MCO which the team doesn't think we have enough time for in the 4.4 timeframe |
Add an enhancement proposal for supporting both ignition spec 2 and 3 for OS provisioning/updating. Signed-off-by: Yu Qi Zhang <[email protected]>
37bd8f5
to
e1e6b6f
Compare
ping @openshift/openshift-architects it does lgtm, can any of you approve? /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: runcom, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
damn, I thought I couldn't approve but I still think this is gonna be the way we all agreed on - @crawford since you reviewed this, please take a last look at this. |
|
||
All machineconfigs are switched to spec 3. Spec 2 support will exist in the form | ||
of served Ignition configs to new nodes joining the cluster that have Ignition | ||
v0 in the bootimage. MCS will handle this with 2 endpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCS will handle this with 2 endpoints.
This should be a single endpoint. Ignition already uses the Accept HTTP header to specify which configs it can accept and the order of preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @yuqi-zhang this is related also to the ignition endpoint contained in that secret created by the installer (and unmanaged)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update the proposal. We would still first need an option to modify the secret anyways.
Add an enhancement proposal for supporting both ignition spec 2
and 3 for OS provisioning/updating.
Signed-off-by: Yu Qi Zhang [email protected]