-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNF-11234: Enable RTE metrics to be scraped securely by Prometheus #1035
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ronny Baturov <[email protected]>
This commit consist of the following changes: * Reenabled kube-rbac-proxy sidecar container to securely expose the /metrics endpoint for Prometheus scraping. * Added a secret to enforce HTTPS-only access to the /metrics endpoint, restricted to the Prometheus service account. * modified ServiceMonitor resource to enable Prometheus pods to scrape metrics. * Added an annotation to the deployment Service, which is monitored by the Service CA operator. This operator will generate the tls.key and tls.crt files inside the secret-kube-rbac-proxy-tls secret, which is used by the kube-rbac-proxy container. * Added Role and RoleBinding resources to grant the necessary permissions to the Prometheus service account. Most of this configuration was based on this guide: https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/ Signed-off-by: Ronny Baturov <[email protected]>
Signed-off-by: Ronny Baturov <[email protected]>
Signed-off-by: Ronny Baturov <[email protected]>
@rbaturov: This pull request references CNF-11234 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rbaturov The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This allows injecting the kube-rbac-proxy container into the RTE DaemonSet before deployment. Signed-off-by: Ronny Baturov <[email protected]>
Added functionality to pull the kube-proxy-sidecar image by default, using the same image as the operator. Signed-off-by: Ronny Baturov <[email protected]>
We will use the pkg/metrics to generate the needed metrics manifests to be applied by the operator. Signed-off-by: Ronny Baturov <[email protected]>
As part of enabling metrics for RTE, a Service resource is created during the deployment of the RTE metrics manifests by the operator. This commit grants the operator pod the necessary permissions to deploy the Service CR. Signed-off-by: Ronny Baturov <[email protected]>
* Integrating RTE metrics manifests to be deployed by the operator * This adds unit test for metrics components creation Signed-off-by: Ronny Baturov <[email protected]>
712c437
to
e741fe1
Compare
Signed-off-by: Ronny Baturov <[email protected]>
c7d66fa
to
c25a99d
Compare
We need to modify the ClusterRole to grant the required permissions for the manager and RTE-worker pods to access the /metrics endpoint. This is essential for e2e testing, as we will run curl commands from within the manager and RTE-worker containers to interact with the endpoint. Since this ClusterRole is also used by the RTE service account (as referenced in PR openshift-kni#1035), this change will grant access to both pods. Signed-off-by: Ronny Baturov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main goal of this work was to securely expose the metrics, not to configure the cluster to consume them.
We can do this for 4.18, but the backportable code should only expose the server metrics without configuring the manifests (and adding permissions)
}, | ||
}, | ||
SecurityContext: &corev1.SecurityContext{ | ||
AllowPrivilegeEscalation: &[]bool{false}[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr.To
?
@@ -199,6 +201,80 @@ func ContainerConfig(ds *appsv1.DaemonSet, name string) error { | |||
return nil | |||
} | |||
|
|||
func SidecarContainerConfig(ds *appsv1.DaemonSet) { | |||
// Define the sidecar container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant comment(s)
}, | ||
} | ||
|
||
// Define the volumes for the sidecar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant comment(s)
}, | ||
} | ||
|
||
// Add the volumes to the DaemonSet spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant comment(s)
// Add the volumes to the DaemonSet spec | ||
ds.Spec.Template.Spec.Volumes = append(ds.Spec.Template.Spec.Volumes, sidecarVolumes...) | ||
|
||
// Add the sidecar container to the DaemonSet spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant comment(s)
@@ -273,6 +273,13 @@ func SidecarContainerConfig(ds *appsv1.DaemonSet) { | |||
|
|||
// Add the sidecar container to the DaemonSet spec | |||
ds.Spec.Template.Spec.Containers = append(ds.Spec.Template.Spec.Containers, sidecarContainer) | |||
|
|||
// Pull kube-rbac-proxy image from operator proxy. | |||
if sidecarImage, err := images.GetKubeRbacProxyImage(context.TODO()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a parameter, this code should not access the cluster APIs. We should detect this image once (in main.go
and pass it down here.
if err != nil { | ||
return nil, errors.Wrapf(err, "failed to set controller reference to %s %s", obj.GetNamespace(), obj.GetName()) | ||
} | ||
err = r.Client.Create(ctx, obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or updated, the owned object should be continuosly reconciled.
In order for the sidecar be ready and operating (i. e serving metrics), the service account, and the clusterrolebinding are mandatory. look for this comment for more info |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@rbaturov: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is a follow-up PR for Enable NROP metrics to be to scraped securely by Prometheus .
This PR encompasses and integrates all the needed infrastructure to enable secured communication for scraping metrics by Prometheus, now for RTE.
A follow-up PR will be posted for e2e tests later.
To validate that this PR is functioning correctly, please follow these steps:
make docker-build docker-push
)make deploy
oc exec -it prometheus-k8s-0 -n openshift-monitoring /bin/bash