Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNF-11234: Enable RTE metrics to be scraped securely by Prometheus #1035

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rbaturov
Copy link
Contributor

@rbaturov rbaturov commented Oct 6, 2024

This is a follow-up PR for Enable NROP metrics to be to scraped securely by Prometheus .

This PR encompasses and integrates all the needed infrastructure to enable secured communication for scraping metrics by Prometheus, now for RTE.

A follow-up PR will be posted for e2e tests later.

To validate that this PR is functioning correctly, please follow these steps:

  1. build image of the operator (make docker-build docker-push)
  2. run: make deploy
  3. Attach to one of the prometheus pods oc exec -it prometheus-k8s-0 -n openshift-monitoring /bin/bash
  4. run:
curl -v \
--cacert /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
https://rte-metrics-service.numaresources.svc:8443/metrics

Signed-off-by: Ronny Baturov <[email protected]>
This commit consist of the following changes:

* Reenabled kube-rbac-proxy sidecar container to securely expose the /metrics endpoint for Prometheus scraping.
* Added a secret to enforce HTTPS-only access to the /metrics endpoint, restricted to the Prometheus service account.
* modified ServiceMonitor resource to enable Prometheus pods to scrape metrics.
* Added an annotation to the deployment Service, which is monitored by the Service CA operator. This operator will generate the tls.key and tls.crt files inside the secret-kube-rbac-proxy-tls secret, which is used by the kube-rbac-proxy container.
* Added Role and RoleBinding resources to grant the necessary permissions to the Prometheus service account.

Most of this configuration was based on this guide:
https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/

Signed-off-by: Ronny Baturov <[email protected]>
Signed-off-by: Ronny Baturov <[email protected]>
@openshift-ci-robot
Copy link

@rbaturov: This pull request references CNF-11234 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This is a follow-up PR for Enable NROP metrics to be to scraped securely by Prometheus .

This PR encompasses and integrates all the needed infrastructure to enable secured communication for scraping metrics by Prometheus, now for RTE.

A follow-up PR will be posted for e2e tests later.

To validate that this PR is functioning correctly, please follow these steps:

  1. build image of the operator (make docker-build docker-push)
  2. run: make deploy
  3. Attach to one of the prometheus pods oc exec -it prometheus-k8s-0 -n openshift-monitoring /bin/bash
  4. run:
curl -v \
--cacert /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
https://rte-metrics-service.numaresources.svc:8443/metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Oct 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rbaturov
Once this PR has been reviewed and has the lgtm label, please assign swatisehgal for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This allows injecting the kube-rbac-proxy container into the RTE DaemonSet before deployment.

Signed-off-by: Ronny Baturov <[email protected]>
Added functionality to pull the kube-proxy-sidecar image by default, using the same image as the operator.

Signed-off-by: Ronny Baturov <[email protected]>
We will use the pkg/metrics to generate the needed metrics manifests to be applied by the operator.

Signed-off-by: Ronny Baturov <[email protected]>
As part of enabling metrics for RTE, a Service resource is created during the deployment of the RTE metrics manifests by the operator. This commit grants the operator pod the necessary permissions to deploy the Service CR.

Signed-off-by: Ronny Baturov <[email protected]>
* Integrating RTE metrics manifests to be deployed by the operator
* This adds unit test for metrics components creation

Signed-off-by: Ronny Baturov <[email protected]>
Signed-off-by: Ronny Baturov <[email protected]>
rbaturov added a commit to rbaturov/numaresources-operator that referenced this pull request Oct 7, 2024
We need to modify the ClusterRole to grant the required permissions for the manager and RTE-worker pods to access the /metrics endpoint. This is essential for e2e testing, as we will run curl commands from within the manager and RTE-worker containers to interact with the endpoint. Since this ClusterRole is also used by the RTE service account (as referenced in PR openshift-kni#1035), this change will grant access to both pods.

Signed-off-by: Ronny Baturov <[email protected]>
@rbaturov rbaturov changed the title CNF-11234: Enable RTE metrics to be scraped securely by Prometheus. CNF-11234: Enable RTE metrics to be scraped securely by Prometheus Oct 9, 2024
Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main goal of this work was to securely expose the metrics, not to configure the cluster to consume them.
We can do this for 4.18, but the backportable code should only expose the server metrics without configuring the manifests (and adding permissions)

},
},
SecurityContext: &corev1.SecurityContext{
AllowPrivilegeEscalation: &[]bool{false}[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr.To?

@@ -199,6 +201,80 @@ func ContainerConfig(ds *appsv1.DaemonSet, name string) error {
return nil
}

func SidecarContainerConfig(ds *appsv1.DaemonSet) {
// Define the sidecar container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant comment(s)

},
}

// Define the volumes for the sidecar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant comment(s)

},
}

// Add the volumes to the DaemonSet spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant comment(s)

// Add the volumes to the DaemonSet spec
ds.Spec.Template.Spec.Volumes = append(ds.Spec.Template.Spec.Volumes, sidecarVolumes...)

// Add the sidecar container to the DaemonSet spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant comment(s)

@@ -273,6 +273,13 @@ func SidecarContainerConfig(ds *appsv1.DaemonSet) {

// Add the sidecar container to the DaemonSet spec
ds.Spec.Template.Spec.Containers = append(ds.Spec.Template.Spec.Containers, sidecarContainer)

// Pull kube-rbac-proxy image from operator proxy.
if sidecarImage, err := images.GetKubeRbacProxyImage(context.TODO()); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a parameter, this code should not access the cluster APIs. We should detect this image once (in main.go and pass it down here.

if err != nil {
return nil, errors.Wrapf(err, "failed to set controller reference to %s %s", obj.GetNamespace(), obj.GetName())
}
err = r.Client.Create(ctx, obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or updated, the owned object should be continuosly reconciled.

@rbaturov
Copy link
Contributor Author

the main goal of this work was to securely expose the metrics, not to configure the cluster to consume them. We can do this for 4.18, but the backportable code should only expose the server metrics without configuring the manifests (and adding permissions)

In order for the sidecar be ready and operating (i. e serving metrics), the service account, and the clusterrolebinding are mandatory. look for this comment for more info

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2024
Copy link
Contributor

openshift-ci bot commented Nov 7, 2024

@rbaturov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ci-e2e-install-hypershift c25a99d link true /test ci-e2e-install-hypershift
ci/prow/ci-index c25a99d link true /test ci-index

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants