Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1789581: Fix osImageURL upgrade race #1357

Merged
merged 2 commits into from
Jan 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions install/0000_80_machine-config-operator_05_osimageurl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ metadata:
name: machine-config-osimageurl
namespace: openshift-machine-config-operator
data:
releaseVersion: 0.0.1-snapshot
# The OS payload, managed by the daemon + pivot + rpm-ostree
# https://github.com/openshift/machine-config-operator/issues/183
osImageURL: "registry.svc.ci.openshift.org/openshift:machine-os-content"
5 changes: 5 additions & 0 deletions pkg/operator/sync.go
Original file line number Diff line number Diff line change
Expand Up @@ -686,6 +686,11 @@ func (optr *Operator) getOsImageURL(namespace string) (string, error) {
if err != nil {
return "", err
}
releaseVersion := cm.Data["releaseVersion"]
optrVersion, _ := optr.vStore.Get("operator")
if releaseVersion != optrVersion {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any valid cases where the configmap from a previous setup would either be unversioned or not match while transitioning to versioned?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous configmaps won't be versioned; but that's fine because Golang doesn't have Option<> so the releaseVersion will be the empty string, which won't match. Which is what we want - we'll ignore the previous unversioned configmap.

Copy link
Member Author

@runcom runcom Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice there would still be a possible race where the 4.2 MCO reads the new 4.3 osimageurl configmap and could generate a new rendered mc with a newer osimageurl, that's why I believe we need to patch 4.2 as well to take versioning into account

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice there would still be a possible race where the 4.2 MCO reads the new 4.3 osimageurl configmap and could generate a new rendered mc with a newer osimageurl, that's why I believe we need to patch 4.2 as well to take versioning into account

I would rather we address that race condition directly rather than trying to patch 4.2 which requires us to enforce that 4.2.x (unpatched) upgrades through 4.2.x+1(patched) before going to 4.3. That's possible in theory but seems to be cumbersome in practice.

Some thoughts:

  1. Version the key in the osimageurl configmap (e.g. osImageURL-0.0.1-snapshot: ...) so that each MCO release has its own image reference. This would address this upgrade issue by changing the key name in 4.3 and protects us from any future upgrade race-conditions and failures. It would also allow you to do some weird cross-version testing, like if you wanted to have multiple MCOs referencing different osimages for some reason.
  2. Set the osimageurl value on the MCO deployment directly - it can be a flag passed in to the operator, the release infra will replace the image reference in the deployment manifest the same as it does in the configmap. If there are other reasons for using a configmap, the MCO deployment can be changed to reference a configmap mounted in just like all the other images. This would reduce any version issues to mismatches in the CC/MCC which we've dealt with before.
  3. Make MCO updates atomic by putting everything that always and only updates together together. The osimageurl, MCO, MCC controllers, and MC templates. Simplify controller architecture, merge "operator" and "controller" #878

return "", fmt.Errorf("refusing to read osImageURL version %q, operator version %q", releaseVersion, optrVersion)
}
return cm.Data["osImageURL"], nil
}

Expand Down