Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSASINFRA-3492: openstack: leverage ORC to handle RHCOS image #5139

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EmilienM
Copy link
Member

What this PR does / why we need it:

Instead of forcing the users to provide an existing OpenStack Glance
image, we now let our CAPI provider to upload the image used in the
release payload and handle its lifecycle with ORC.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 15, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 15, 2024

@EmilienM: This pull request references OSASINFRA-3492 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

What this PR does / why we need it:

Instead of forcing the users to provide an existing OpenStack Glance
image, we now let our CAPI provider to upload the image used in the
release payload and handle its lifecycle with ORC.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added area/cli Indicates the PR includes changes for CLI area/documentation Indicates the PR includes changes for documentation labels Nov 15, 2024
Copy link
Contributor

openshift-ci bot commented Nov 15, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: EmilienM
Once this PR has been reviewed and has the lgtm label, please assign bryan-cox for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Nov 15, 2024
Copy link

netlify bot commented Nov 15, 2024

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit 110dfbd
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/6737827369ac7400081b21bf
😎 Deploy Preview https://deploy-preview-5139--hypershift-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

type CoreOpenStackDisk struct {
Release string `json:"release"`
URL string `json:"url"`
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to add the Hash as well.

@EmilienM
Copy link
Member Author

/cc bryan-cox
since you worked on that topic.

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either needs wiring up to nodepool to use imageRef (preferrable), or we need to add a wait loop somewhere in here to ensure that the image is Available before creating any machines referencing it.

We don't want to do that second one, so... we should add an imageRef to Nodepool.

func reconcileOpenStackImageCR(ctx context.Context, client client.Client, createOrUpdate upsert.CreateOrUpdateFN, hcluster *hyperv1.HostedCluster, controlPlaneNamespace string) error {
openStackImage := orc.Image{
ObjectMeta: metav1.ObjectMeta{
Name: "rhcos" + hcluster.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name: "rhcos" + hcluster.Name,
Name: "rhcos-" + hcluster.Name,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to name it by the release image version so it can be shared between clusters 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the resource could be shared across multiple clusters, that's an interesting idea. I'll investigate for sure.

Spec: orc.ImageSpec{},
}

if _, err := createOrUpdate(ctx, client, &openStackImage, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: check what createOrUpdate() does. Given that we have applyconfigs it's trivial for us to SSA these, so there's really no need for a function that actually does create or update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the doc:

// CreateOrUpdate is a copy of controllerutil.CreateOrUpdate with
// an important difference: It copies a number of fields from the object
// on the server to the mutated object if unset in the latter. This
// avoids unnecessary updates when our code sets a whole struct that
// has fields that get defaulted by the server.

return rhcosImageURL, rhcosImageRelease, fmt.Errorf("arch does not exist in release image, arch: %s", supportedArch)
}

rhcosImageURL = releaseImage.StreamMetadata.Architectures[hyperv1.ArchAliases[supportedArch]].RHCOS.OpenStackDisk.URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 99% sure we can grab the SHA256s from here, too. It's probably in OpenStackDisk.

// we plan to deliver the OpenStack provider as a dev preview.
return nil, fmt.Errorf("image name is required")
openStackMachineTemplate.Template.Spec.Image.ImageRef = &capiopenstackv1beta1.ResourceReference{
Name: "rhcos" + hcluster.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name: "rhcos" + hcluster.Name,
Name: "rhcos-" + hcluster.Name,

(Or version number)

@@ -69,14 +69,20 @@ type CoreOSImages struct {
}

type CoreRHCOSImage struct {
AzureDisk CoreAzureDisk `json:"azure-disk"`
AzureDisk CoreAzureDisk `json:"azure-disk"`
OpenStackDisk CoreOpenStackDisk `json:"openstack-disk"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is required? Doesn't seem to be used. What do you expect to go here?

@@ -498,3 +505,39 @@ func defaultWorkerSecurityGroupRules(machineCIDRs []string) []capo.SecurityGroup

return ingressRules
}

// lookupRHCOSImage looks up a release image and extracts the RHCOS image URL and release version
func lookupRHCOSImage(ctx context.Context, client client.Client, hcluster *hyperv1.HostedCluster) (string, string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just leverage the existing lookupRHCOSImage and make it an exported function. We could pass in a platform field to differentiate between OSP and Azure.

Instead of forcing the users to provide an existing OpenStack Glance
image, we now let our CAPI provider to upload the image used in the
release payload and handle its lifecycle with ORC.
@@ -67,6 +67,13 @@ func (c *CAPI) Reconcile(ctx context.Context) error {
return err
}

// Reconcile ORC resources
if nodePool.Spec.Platform.Type == hyperv1.OpenStackPlatform {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this pattern will be accepted. Suggestions are welcome!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryan-cox I realized that we probably need to handle the OpenStack RHCOS image by Nodepool, since the release image can be overridden per Nodepool?

func ReconcileOpenStackImageCR(ctx context.Context, client client.Client, createOrUpdate upsert.CreateOrUpdateFN, hcluster *hyperv1.HostedCluster, controlPlaneNamespace string, release *releaseinfo.ReleaseImage) error {
openStackImage := orc.Image{
ObjectMeta: metav1.ObjectMeta{
Name: "rhcos-" + hcluster.Name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: name


openStackImageSpec.Resource = &orc.ImageResourceSpec{
// THIS IS NOT GOOD, NEEDS TO BE FIXED (something related to the image itself should be used, like version)
Name: "rhcos-" + hcluster.Name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: name

@EmilienM
Copy link
Member Author

/test e2e-openstack

@@ -54,7 +54,7 @@ func defaultImage(releaseImage *releaseinfo.ReleaseImage) (string, string, error
return containerImage, split[1], nil
}

func unsupportedOpenstackDefaultImage(releaseImage *releaseinfo.ReleaseImage) (string, string, error) {
func UnsupportedOpenstackDefaultImage(releaseImage *releaseinfo.ReleaseImage) (string, string, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to move that function to utils or something

Copy link
Contributor

openshift-ci bot commented Nov 18, 2024

@EmilienM: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 042fd11 link true /test unit
ci/prow/e2e-openstack 042fd11 link false /test e2e-openstack
ci/prow/okd-scos-e2e-aws-ovn 042fd11 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-4-17 042fd11 link true /test e2e-aws-4-17

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@EmilienM
Copy link
Member Author

For myself: we need to update RBAC:

{"level":"error","ts":"2024-11-18T21:20:51Z","msg":"Failed to reconcile NodePool","controller":"nodepool","controllerGroup":"hypershift.openshift.io","controllerKind":"NodePool","NodePool":{"name":"example-n42fg","namespace":"e2e-clusters-mgz8n"},"namespace":"e2e-clusters-mgz8n","name":"example-n42fg","reconcileID":"dc9ea120-8120-400a-af0a-3b265526072a","error":"images.openstack.k-orc.cloud is forbidden: User \"system:serviceaccount:hypershift:operator\" cannot create resource \"images\" in API group \"openstack.k-orc.cloud\" in the namespace \"e2e-clusters-mgz8n-example-n42fg\"","stacktrace":"github.com/openshift/hypershift/hypershift-operator/controllers/nodepool.(*NodePoolReconciler).Reconcile\n\t/hypershift/hypershift-operator/controllers/nodepool/nodepool_controller.go:205\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli Indicates the PR includes changes for CLI area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants