Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support installing extensions and rework OS update #1941

Merged
merged 11 commits into from
Aug 1, 2020

Conversation

sinnykumari
Copy link
Contributor

Earlier to perform OS update we were pulling OS image
and mounting the content using podman. We were performing
OS update in host context because of selinux constraints
on mounted container.

Also, for rt-kernel switch we were pulling OS image again.
With coreos extensions support, we require extensions rpm
which is available in os container.

We tried different approach to solve problems
like minimizing container image pull, host/container
context switch, selinux permission on mounted
container and rpm-ostreed behavior.
See #1850

Finally, using oc image extract and skopeo to inspect container
image solves our problems. With this we are also getting
rid of using mco-pivot systemd service to rebase OS.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2020
@sinnykumari sinnykumari requested a review from runcom July 21, 2020 11:08
@sinnykumari
Copy link
Contributor Author

Note that with this PR we don't need to do m-c-d pivot . So, if we require tunable args then we can either update update()->updateKernelArguments() accordingly or update our docs to perform tunable args via MachineConfigs.

@sinnykumari sinnykumari requested a review from LorbusChris July 21, 2020 11:17
return fmt.Errorf("error creating extensions repo %s: %v", extensionsRepo, err)
}
defer repo.Close()
if _, err := repo.WriteString("[coreos-extensions]\nenabled=1\nmetadata_expire=1m\nbaseurl=" + osImageContentDir + "/extensions/\ngpgcheck=0\nskip_if_unavailable=False\n"); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use writeFileAtomically

args := generateExtensionsArgs(oldConfig, newConfig)
glog.Infof("Applying extensions : %+q", args)
if err := exec.Command("rpm-ostree", args...).Run(); err != nil {
return fmt.Errorf("Failed to execute rpm-ostree %+q : %v", args, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: errors start lowercase (glog uppercase)

@sinnykumari
Copy link
Contributor Author

/retest

var output []byte
output, err = runGetOut("podman", inspectArgs...)
output, err = runGetOut("skopeo", inspectArgs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to re-introduce openshift/pivot#51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went skopeo route to save pulling container image as it is required by podman inspect. Is skopeo inspect more expensive than doing podman pull containerImg && podman inspect ?

I was also wondering if we can just use https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/rpm-ostree.go#L257,L280 to get checksum and rebase. It works but not sure if we hit issue in some situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually have the containers/image stack vendored into the MCO for other reasons, so we could just directly do that from our process. Another alternative looks like oc image info -o json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could try to figure out if there's a way to get skopeo to not try to enumerate all the tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at skopeo code, it doesn't seem like we can ask skopeo to skip looking for tags(have also asked on skopeo irc channel to get a second opinion)
Not very comfortable with us maintaining the code to inspect images, feels like it is better to use already tested tool oc image info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with skopeo team it appears skipping to fetch RepoTags is not possible right now. With inspiration from skopeo inspect code, wrote an independent program which uses containers/image stack(already vendored in MCO) https://github.com/sinnykumari/container-inspect/blob/master/container_inspect.go and doesn't fetch RepoTags . If this looks ok I will integrate this in MCO, otherwise we can go for oc image info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed changes to use containers/image stack directly for inspecting image.

cid := strings.TrimSpace(string(cidBuf))
// Use the container ID to find its mount point
var mntBuf []byte
mntBuf, err = runGetOut("podman", "mount", cid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we missing a call to the extract function here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully I understood your question correctly.
We are now not mounting container image. https://github.com/openshift/machine-config-operator/pull/1941/files#diff-06961b075f1753956d802ba954d2cfb5R251 is where we extract OS container content in /run/mco-machine-os-content/ and use same for rpm-ostree rebase.

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts

}
}

if mcDiff.osUpdate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing bc the if stmt right before this is also checking for mcDiff.osUpdate. Why not nest this inside the above or not possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel moving this with earlier section is going to be tricky and may lead to un-necessary bugs. Both section are doing different things and right order is really important here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if is not needed here as we also compare OSImageURL in updateOS()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

}
dn.logSystem("Updating rt-kernel packages on host: %+q", args)
if err := exec.Command("rpm-ostree", "update").Run(); err != nil {
return fmt.Errorf("Failed to execute rpm-ostree %+q : %v", args, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower case

pivotErr, err2 := ioutil.ReadFile(pivottypes.PivotFailurePath)
if err2 != nil || len(pivotErr) == 0 {
glog.Warningf("pivot error file doesn't contain anything, it was never written or an error occurred: %v", err2)
// Before rebase make sure that osImageContentDir exists. This is to continue with OS update in case we came here directly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this comment!

return nil
}

func extractOSImage(imgURL string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this file has so many helper funcs it would be easier to move them into their own file for readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. will look at moving helper function in a separate file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this is one of the things which we can discuss collectively to cleanup and refactor MCO code.

return err
}

if mcDiff.osUpdate || mcDiff.extensions || mcDiff.kernelType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this should be 1 func call that evals this if inside and then calls extract, addExtensions & updateOS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, there is scope of moving OS update/layering section into a function.

@sinnykumari
Copy link
Contributor Author

Thank you all for reviewing!
Have pushed changes which should address all comments other than skopeo issue(digging into it)

@@ -1004,7 +1004,10 @@ func (dn *Daemon) checkStateOnFirstRun() error {
if !osMatch {
glog.Infof("Bootstrap pivot required to: %s", targetOSImageURL)
// This only returns on error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this comment down to the dn.finalizeAndReboot now I suppose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm, we will return back here also when updateOS() fails?

// addExtensionsRepo adds a repo into /etc/yum.repos.d/ which we use later to
// install extensions and rt-kernel
func addExtensionsRepo() error {
repoConetnt := "[coreos-extensions]\nenabled=1\nmetadata_expire=1m\nbaseurl=" + osImageContentDir + "/extensions/\ngpgcheck=0\nskip_if_unavailable=False\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why gpgcheck=0 here? Shouldn't all the RPMs we ship via extensions be signed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good catch. Don't know why I kept it like that, most likely copy paste error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had an offline discussion, but putting a comment here for future reference:
It is possible that some packages may not be signed, especially when dealing with nightly images. We are anyway shipping extensions packages in machine-os-content(which is source of truth for us). So it should be ok to install them with gpgcheck off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that it's ok to turn off signature verification. It seems like it leaves open a pretty significant security hole that attackers could exploit to run untrusted code on the host. While we check the machine-os-content image signature, we don't have any checks on the integrity of the files in the extensions directory ensuring that the RPMs that get installed are same ones shipped in the machine-os-image without alteration. There are likely other security gaps that could be enabled by installing unverified RPMs.

At the very least, I'm pretty sure many customers in highly-regulated industries require every package to be signed and validated to run on a host.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we check the machine-os-content image signature, we don't have any checks on the integrity of the files in the extensions directory ensuring that the RPMs that get installed are same ones shipped in the machine-os-image without alteration.

We do, because we pull-by-digest and the container stack verifies that the digest matches.

// addExtensionsRepo adds a repo into /etc/yum.repos.d/ which we use later to
// install extensions and rt-kernel
func addExtensionsRepo() error {
repoConetnt := "[coreos-extensions]\nenabled=1\nmetadata_expire=1m\nbaseurl=" + osImageContentDir + "/extensions/\ngpgcheck=0\nskip_if_unavailable=False\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo: repoContent

@sinnykumari
Copy link
Contributor Author

/retest

@sinnykumari
Copy link
Contributor Author

/test cluster-bootimages

@sinnykumari sinnykumari force-pushed the coreos-extensions branch 2 times, most recently from 2123350 to 33c3ce8 Compare July 27, 2020 09:52
@sinnykumari
Copy link
Contributor Author

@mtrmac Thank you for the help with implementing container inspect. 28742e3 is how currently I have implemented in MCO, let us know if you have any additional feedback.

@sinnykumari
Copy link
Contributor Author

/retest

@sinnykumari
Copy link
Contributor Author

/test cluster-bootimages

@sinnykumari
Copy link
Contributor Author

Tests seems to be flaky
/retest

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Reading only the image inspect part.) ACK, just trivial nits.

@@ -0,0 +1,77 @@
package daemon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t the file be named image-inspect.go?

return err
}

// parseImageSource converts image URL-like string to an ImageSource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With docker.ParseReference it is no longer very URL-like; and I would expect this function to handle the "//" prefix instead of callers having to care. Ideally, this should “convert an image reference”, or “create an ImageSource from an image reference”.


// parseImageSource converts image URL-like string to an ImageSource.
// The caller must call .Close() on the returned ImageSource.
func parseImageSource(ctx context.Context, name string) (types.ImageSource, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely non-blocking: newDockerImageSource maybe; the “parse” name was used in context of a fairly general CLI that is not applicable in this call tree.

if err != nil {
return nil, err
}
sys := &types.SystemContext{AuthFilePath: kubeletAuthFile}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller can pass this down instead of having two copies that can get out of sync.

@@ -8,23 +8,20 @@ require (
github.com/Masterminds/goutils v1.1.0 // indirect
github.com/Masterminds/semver v1.4.2 // indirect
github.com/Masterminds/sprig v2.20.0+incompatible
github.com/Microsoft/go-winio v0.4.14 // indirect
github.com/OpenPeeDeeP/depguard v1.0.1 // indirect
github.com/apparentlymart/go-cidr v1.0.0
github.com/ashcrow/osrelease v0.0.0-20180626175927-9b292693c55c
github.com/clarketm/json v1.14.1
github.com/containers/image v3.0.2+incompatible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, container-runtime-config (and openshift/runtime-utils) still use the old version. That should probably be updated to decrease duplication… but it’s also rather out of scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can clean this up separately.

var imagedataArray []imageInspection
err = json.Unmarshal(output, &imagedataArray)
var imageData *types.ImageInspectInfo
imageData, err = imageInspect(fmt.Sprintf("//%s", container))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, I’d expect the "//" syntax to be hidden in the implementation.

@sinnykumari
Copy link
Contributor Author

(Reading only the image inspect part.) ACK, just trivial nits.

Thanks for the review, pushed changes addressing the feedback.

@sinnykumari
Copy link
Contributor Author

/test cluster-bootimages

Vendor latest containers/image v5.5.1 which contains
utilities needed to inspect container image in MCO.
Also rename pkg/daemon/container-inspect.go to pkg/daemon/image-inspect.go
to fit naming better with what it does.
Also, explicitly delete the extracted OS image content once done
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2020
@sinnykumari
Copy link
Contributor Author

Rebased to get e2e-gcp-op fixes from #1969 included

@sinnykumari
Copy link
Contributor Author

needs lgtm blessing :)

@cgwalters
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, runcom, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,runcom,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sinnykumari
Copy link
Contributor Author

e2e-aws is failing consistently today across various PRs.
If it makes sense let's override it.

@kikisdeliveryservice
Copy link
Contributor

e2e-aws is failing consistently today across various PRs.
If it makes sense let's override it.

The PR is already approved by the cutoff, we usually don't override tests just to merge, they usually end up merging over the weekend. This PR hasn't passed this test since its initial run on Tuesday so I think we should just let the bot retest so it can pass.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@sinnykumari: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-proxy 26b51d2abc9a3842d062eede1321dbce2012522c link /test e2e-aws-proxy
ci/prow/e2e-metal-ipi 5b44766 link /test e2e-metal-ipi
ci/prow/okd-e2e-aws 5b44766 link /test okd-e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 17b8d09 into openshift:master Aug 1, 2020
yuqi-zhang added a commit to yuqi-zhang/machine-config-operator that referenced this pull request Aug 28, 2020
With the introduction of extensions and reworking of OS updates in
openshift#1941,
we now run `oc/podman` directly in the MCD to fetch the OS image.
Thus we need to inject the proxy env vars into the container via the
daemonset definition, otherwise image pulls will fail.

Signed-off-by: Yu Qi Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.