-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support installing extensions and rework OS update #1941
Support installing extensions and rework OS update #1941
Conversation
Note that with this PR we don't need to do |
pkg/daemon/update.go
Outdated
return fmt.Errorf("error creating extensions repo %s: %v", extensionsRepo, err) | ||
} | ||
defer repo.Close() | ||
if _, err := repo.WriteString("[coreos-extensions]\nenabled=1\nmetadata_expire=1m\nbaseurl=" + osImageContentDir + "/extensions/\ngpgcheck=0\nskip_if_unavailable=False\n"); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use writeFileAtomically
pkg/daemon/update.go
Outdated
args := generateExtensionsArgs(oldConfig, newConfig) | ||
glog.Infof("Applying extensions : %+q", args) | ||
if err := exec.Command("rpm-ostree", args...).Run(); err != nil { | ||
return fmt.Errorf("Failed to execute rpm-ostree %+q : %v", args, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: errors start lowercase (glog uppercase)
/retest |
pkg/daemon/rpm-ostree.go
Outdated
var output []byte | ||
output, err = runGetOut("podman", inspectArgs...) | ||
output, err = runGetOut("skopeo", inspectArgs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to re-introduce openshift/pivot#51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went skopeo route to save pulling container image as it is required by podman inspect. Is skopeo inspect more expensive than doing podman pull containerImg && podman inspect
?
I was also wondering if we can just use https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/rpm-ostree.go#L257,L280 to get checksum and rebase. It works but not sure if we hit issue in some situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually have the containers/image stack vendored into the MCO for other reasons, so we could just directly do that from our process. Another alternative looks like oc image info -o json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could try to figure out if there's a way to get skopeo to not try to enumerate all the tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at skopeo code, it doesn't seem like we can ask skopeo to skip looking for tags(have also asked on skopeo irc channel to get a second opinion)
Not very comfortable with us maintaining the code to inspect images, feels like it is better to use already tested tool oc image info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with skopeo team it appears skipping to fetch RepoTags is not possible right now. With inspiration from skopeo inspect code, wrote an independent program which uses containers/image stack(already vendored in MCO) https://github.com/sinnykumari/container-inspect/blob/master/container_inspect.go and doesn't fetch RepoTags . If this looks ok I will integrate this in MCO, otherwise we can go for oc image info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed changes to use containers/image stack directly for inspecting image.
cid := strings.TrimSpace(string(cidBuf)) | ||
// Use the container ID to find its mount point | ||
var mntBuf []byte | ||
mntBuf, err = runGetOut("podman", "mount", cid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we missing a call to the extract function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully I understood your question correctly.
We are now not mounting container image. https://github.com/openshift/machine-config-operator/pull/1941/files#diff-06961b075f1753956d802ba954d2cfb5R251 is where we extract OS container content in /run/mco-machine-os-content/ and use same for rpm-ostree rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts
pkg/daemon/update.go
Outdated
} | ||
} | ||
|
||
if mcDiff.osUpdate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing bc the if stmt right before this is also checking for mcDiff.osUpdate. Why not nest this inside the above or not possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel moving this with earlier section is going to be tricky and may lead to un-necessary bugs. Both section are doing different things and right order is really important here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if
is not needed here as we also compare OSImageURL in updateOS()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
pkg/daemon/update.go
Outdated
} | ||
dn.logSystem("Updating rt-kernel packages on host: %+q", args) | ||
if err := exec.Command("rpm-ostree", "update").Run(); err != nil { | ||
return fmt.Errorf("Failed to execute rpm-ostree %+q : %v", args, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower case
pkg/daemon/update.go
Outdated
pivotErr, err2 := ioutil.ReadFile(pivottypes.PivotFailurePath) | ||
if err2 != nil || len(pivotErr) == 0 { | ||
glog.Warningf("pivot error file doesn't contain anything, it was never written or an error occurred: %v", err2) | ||
// Before rebase make sure that osImageContentDir exists. This is to continue with OS update in case we came here directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ this comment!
pkg/daemon/update.go
Outdated
return nil | ||
} | ||
|
||
func extractOSImage(imgURL string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this file has so many helper funcs it would be easier to move them into their own file for readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. will look at moving helper function in a separate file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this is one of the things which we can discuss collectively to cleanup and refactor MCO code.
pkg/daemon/update.go
Outdated
return err | ||
} | ||
|
||
if mcDiff.osUpdate || mcDiff.extensions || mcDiff.kernelType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this should be 1 func call that evals this if inside and then calls extract, addExtensions & updateOS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, there is scope of moving OS update/layering section into a function.
26b51d2
to
0ae69c8
Compare
Thank you all for reviewing! |
@@ -1004,7 +1004,10 @@ func (dn *Daemon) checkStateOnFirstRun() error { | |||
if !osMatch { | |||
glog.Infof("Bootstrap pivot required to: %s", targetOSImageURL) | |||
// This only returns on error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this comment down to the dn.finalizeAndReboot
now I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm, we will return back here also when updateOS() fails?
pkg/daemon/update.go
Outdated
// addExtensionsRepo adds a repo into /etc/yum.repos.d/ which we use later to | ||
// install extensions and rt-kernel | ||
func addExtensionsRepo() error { | ||
repoConetnt := "[coreos-extensions]\nenabled=1\nmetadata_expire=1m\nbaseurl=" + osImageContentDir + "/extensions/\ngpgcheck=0\nskip_if_unavailable=False\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why gpgcheck=0
here? Shouldn't all the RPMs we ship via extensions be signed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good catch. Don't know why I kept it like that, most likely copy paste error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had an offline discussion, but putting a comment here for future reference:
It is possible that some packages may not be signed, especially when dealing with nightly images. We are anyway shipping extensions packages in machine-os-content(which is source of truth for us). So it should be ok to install them with gpgcheck off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that it's ok to turn off signature verification. It seems like it leaves open a pretty significant security hole that attackers could exploit to run untrusted code on the host. While we check the machine-os-content image signature, we don't have any checks on the integrity of the files in the extensions directory ensuring that the RPMs that get installed are same ones shipped in the machine-os-image without alteration. There are likely other security gaps that could be enabled by installing unverified RPMs.
At the very least, I'm pretty sure many customers in highly-regulated industries require every package to be signed and validated to run on a host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we check the machine-os-content image signature, we don't have any checks on the integrity of the files in the extensions directory ensuring that the RPMs that get installed are same ones shipped in the machine-os-image without alteration.
We do, because we pull-by-digest and the container stack verifies that the digest matches.
pkg/daemon/update.go
Outdated
// addExtensionsRepo adds a repo into /etc/yum.repos.d/ which we use later to | ||
// install extensions and rt-kernel | ||
func addExtensionsRepo() error { | ||
repoConetnt := "[coreos-extensions]\nenabled=1\nmetadata_expire=1m\nbaseurl=" + osImageContentDir + "/extensions/\ngpgcheck=0\nskip_if_unavailable=False\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo: repoContent
6cdcc06
to
a714530
Compare
/retest |
a714530
to
990e561
Compare
/test cluster-bootimages |
2123350
to
33c3ce8
Compare
/retest |
/test cluster-bootimages |
Tests seems to be flaky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Reading only the image inspect part.) ACK, just trivial nits.
pkg/daemon/container-inspect.go
Outdated
@@ -0,0 +1,77 @@ | |||
package daemon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t the file be named image-inspect.go
?
pkg/daemon/container-inspect.go
Outdated
return err | ||
} | ||
|
||
// parseImageSource converts image URL-like string to an ImageSource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With docker.ParseReference
it is no longer very URL-like; and I would expect this function to handle the "//"
prefix instead of callers having to care. Ideally, this should “convert an image reference”, or “create an ImageSource
from an image reference”.
pkg/daemon/container-inspect.go
Outdated
|
||
// parseImageSource converts image URL-like string to an ImageSource. | ||
// The caller must call .Close() on the returned ImageSource. | ||
func parseImageSource(ctx context.Context, name string) (types.ImageSource, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely non-blocking: newDockerImageSource
maybe; the “parse” name was used in context of a fairly general CLI that is not applicable in this call tree.
pkg/daemon/container-inspect.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
sys := &types.SystemContext{AuthFilePath: kubeletAuthFile} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller can pass this down instead of having two copies that can get out of sync.
@@ -8,23 +8,20 @@ require ( | |||
github.com/Masterminds/goutils v1.1.0 // indirect | |||
github.com/Masterminds/semver v1.4.2 // indirect | |||
github.com/Masterminds/sprig v2.20.0+incompatible | |||
github.com/Microsoft/go-winio v0.4.14 // indirect | |||
github.com/OpenPeeDeeP/depguard v1.0.1 // indirect | |||
github.com/apparentlymart/go-cidr v1.0.0 | |||
github.com/ashcrow/osrelease v0.0.0-20180626175927-9b292693c55c | |||
github.com/clarketm/json v1.14.1 | |||
github.com/containers/image v3.0.2+incompatible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, container-runtime-config (and openshift/runtime-utils) still use the old version. That should probably be updated to decrease duplication… but it’s also rather out of scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we can clean this up separately.
pkg/daemon/rpm-ostree.go
Outdated
var imagedataArray []imageInspection | ||
err = json.Unmarshal(output, &imagedataArray) | ||
var imageData *types.ImageInspectInfo | ||
imageData, err = imageInspect(fmt.Sprintf("//%s", container)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere, I’d expect the "//"
syntax to be hidden in the implementation.
Thanks for the review, pushed changes addressing the feedback. |
/test cluster-bootimages |
26c0487
to
966d8b8
Compare
Vendor latest containers/image v5.5.1 which contains utilities needed to inspect container image in MCO.
Also rename pkg/daemon/container-inspect.go to pkg/daemon/image-inspect.go to fit naming better with what it does.
Also, explicitly delete the extracted OS image content once done
0cd99b9
to
5b44766
Compare
Rebased to get e2e-gcp-op fixes from #1969 included |
needs lgtm blessing :) |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, runcom, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e2e-aws is failing consistently today across various PRs. |
The PR is already approved by the cutoff, we usually don't override tests just to merge, they usually end up merging over the weekend. This PR hasn't passed this test since its initial run on Tuesday so I think we should just let the bot retest so it can pass. |
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
@sinnykumari: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
With the introduction of extensions and reworking of OS updates in openshift#1941, we now run `oc/podman` directly in the MCD to fetch the OS image. Thus we need to inject the proxy env vars into the container via the daemonset definition, otherwise image pulls will fail. Signed-off-by: Yu Qi Zhang <[email protected]>
Earlier to perform OS update we were pulling OS image
and mounting the content using podman. We were performing
OS update in host context because of selinux constraints
on mounted container.
Also, for rt-kernel switch we were pulling OS image again.
With coreos extensions support, we require extensions rpm
which is available in os container.
We tried different approach to solve problems
like minimizing container image pull, host/container
context switch, selinux permission on mounted
container and rpm-ostreed behavior.
See #1850
Finally, using
oc image extract
and skopeo to inspect containerimage solves our problems. With this we are also getting
rid of using mco-pivot systemd service to rebase OS.