Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNF-10142: Enable NROP metrics to be to scraped securely by Prometheus #1007

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.openshift.io/serving-cert-secret-name: secret-kube-rbac-proxy-tls
creationTimestamp: null
labels:
control-plane: controller-manager
name: numaresources-controller-manager-metrics-service
spec:
ports:
- name: https
port: 8443
protocol: TCP
targetPort: https
selector:
control-plane: controller-manager
status:
loadBalancer: {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: numaresources-controller-manager
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
path: /metrics
scheme: https
targetPort: 8443
tlsConfig:
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
insecureSkipVerify: false
serverName: numaresources-controller-manager-metrics-service.numaresources.svc
selector:
matchLabels:
control-plane: controller-manager
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
name: numaresources-metrics-reader
rules:
- nonResourceURLs:
- /metrics
verbs:
- get
48 changes: 48 additions & 0 deletions bundle/manifests/numaresources-operator.clusterserviceversion.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,18 @@ spec:
- get
- list
- update
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
serviceAccountName: numaresources-controller-manager
deployments:
- label:
Expand Down Expand Up @@ -528,6 +540,35 @@ spec:
memory: 20Mi
securityContext:
allowPrivilegeEscalation: false
- args:
- --secure-listen-address=0.0.0.0:8443
- --upstream=http://127.0.0.1:8080/
- --config-file=/etc/kube-rbac-proxy/config.yaml
- --tls-cert-file=/etc/tls/private/tls.crt
- --tls-private-key-file=/etc/tls/private/tls.key
- --allow-paths=/metrics
- --logtostderr=true
- -v=10
image: gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0
name: kube-rbac-proxy
ports:
- containerPort: 8443
name: https
protocol: TCP
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/kube-rbac-proxy
name: secret-kube-rbac-proxy-metric
readOnly: true
- mountPath: /etc/tls/private
name: secret-kube-rbac-proxy-tls
readOnly: true
securityContext:
runAsNonRoot: true
serviceAccountName: numaresources-controller-manager
Expand All @@ -537,6 +578,13 @@ spec:
key: node-role.kubernetes.io/control-plane
- effect: NoSchedule
key: node-role.kubernetes.io/master
volumes:
- name: secret-kube-rbac-proxy-tls
secret:
secretName: secret-kube-rbac-proxy-tls
- name: secret-kube-rbac-proxy-metric
secret:
secretName: numaresources-secret-kube-rbac-proxy-metric
permissions:
- rules:
- apiGroups:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
creationTimestamp: null
name: numaresources-prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
creationTimestamp: null
name: numaresources-prometheus-k8s
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: numaresources-prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
kind: Secret
metadata:
name: numaresources-secret-kube-rbac-proxy-metric
stringData:
config.yaml: "\"authorization\":\n \"static\":\n - \"path\": \"/metrics\"\n \"resourceRequest\":
false\n \"user\":\n \"name\": \"system:serviceaccount:openshift-monitoring:prometheus-k8s\"\n
\ \"verb\": \"get\" "
type: Opaque
4 changes: 2 additions & 2 deletions config/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ resources:
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
#- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus
- ../prometheus

patchesStrategicMerge:
# Protect the /metrics endpoint by putting it behind auth.
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, please comment the following line.
#- manager_auth_proxy_patch.yaml
- manager_auth_proxy_patch.yaml

# Mount the controller config file for loading manager configurations
# through a ComponentConfig type
Expand Down
35 changes: 28 additions & 7 deletions config/default/manager_auth_proxy_patch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,41 @@ spec:
template:
spec:
containers:
- name: manager
- name: kube-rbac-proxy
image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
image: gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0
args:
- "--secure-listen-address=0.0.0.0:8443"
- "--upstream=http://127.0.0.1:8080/"
- "--config-file=/etc/kube-rbac-proxy/config.yaml"
- "--tls-cert-file=/etc/tls/private/tls.crt"
- "--tls-private-key-file=/etc/tls/private/tls.key"
- "--allow-paths=/metrics"
- "--logtostderr=true"
- "-v=10"
ports:
- containerPort: 8443
protocol: TCP
name: https
- name: manager
args:
- "--platform=kubernetes"
- "--health-probe-bind-address=:8081"
- "--metrics-bind-address=127.0.0.1:8080"
- "--leader-elect"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/kube-rbac-proxy
name: secret-kube-rbac-proxy-metric
readOnly: true
- mountPath: /etc/tls/private
name: secret-kube-rbac-proxy-tls
readOnly: true
serviceAccountName: controller-manager
terminationGracePeriodSeconds: 10
volumes:
- name: secret-kube-rbac-proxy-tls
secret:
secretName: secret-kube-rbac-proxy-tls
- name: secret-kube-rbac-proxy-metric
secret:
secretName: secret-kube-rbac-proxy-metric
1 change: 1 addition & 0 deletions config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ metadata:
workload.openshift.io/allowed: management
labels:
control-plane: controller-manager
openshift.io/cluster-monitoring: "true"
name: system
---
apiVersion: apps/v1
Expand Down
5 changes: 4 additions & 1 deletion config/prometheus/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
resources:
- monitor.yaml
- rbac.yaml
- secret-kube-rbac-proxy.yaml
# Please uncomment monitor.yaml to enable prometheus pods to scrape the metrics periodically.
# - monitor.yaml
24 changes: 14 additions & 10 deletions config/prometheus/monitor.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,24 @@

# Prometheus Monitor Service (Metrics)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
name: controller-manager-metrics-monitor
name: controller-manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pretty generic name for the system namespace
we can getaway with controller-manager generic name ONLY if we are in a numaresources namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the system namespace will always be replaced with numaresources.
Do you wish to return to controller-manager-metrics-monitor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if this is gonna be sitting in the numaresources namespace, it's good

namespace: system
spec:
endpoints:
- path: /metrics
port: https
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
- interval: 30s
# Matches the name of the service's port.
targetPort: 8443
path: /metrics
scheme: https
bearerTokenFile: "/var/run/secrets/kubernetes.io/serviceaccount/token"
tlsConfig:
# The CA file used by Prometheus to verify the server's certificate.
# It's the cluster's CA bundle from the service CA operator.
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
# The name of the server (CN) in the server's certificate.
serverName: numaresources-controller-manager-metrics-service.numaresources.svc
insecureSkipVerify: false
selector:
matchLabels:
control-plane: controller-manager
31 changes: 31 additions & 0 deletions config/prometheus/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# creates Role and RoleBinding for prometheus-k8s service account to access our namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this only in CI or in production in general?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CI, which don't have prometheus installed we don't need this.
However, for production (OCP) if we opt to allow prometheus to scrape merics by default, we should apply these RBAC's.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
namespace: system
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
15 changes: 15 additions & 0 deletions config/prometheus/secret-kube-rbac-proxy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: v1
kind: Secret
metadata:
name: secret-kube-rbac-proxy-metric
namespace: system
stringData:
config.yaml: |-
"authorization":
"static":
- "path": "/metrics"
"resourceRequest": false
"user":
"name": "system:serviceaccount:openshift-monitoring:prometheus-k8s"
"verb": "get"
type: Opaque
2 changes: 2 additions & 0 deletions config/rbac/auth_proxy_service.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.openshift.io/serving-cert-secret-name: secret-kube-rbac-proxy-tls
labels:
control-plane: controller-manager
name: controller-manager-metrics-service
Expand Down
8 changes: 4 additions & 4 deletions config/rbac/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ resources:
# Comment the following 4 lines if you want to disable
# the auth proxy (https://github.com/brancz/kube-rbac-proxy)
# which protects your /metrics endpoint.
#- auth_proxy_service.yaml
#- auth_proxy_role.yaml
#- auth_proxy_role_binding.yaml
#- auth_proxy_client_clusterrole.yaml
- auth_proxy_service.yaml
- auth_proxy_role.yaml
- auth_proxy_role_binding.yaml
- auth_proxy_client_clusterrole.yaml
Comment on lines +15 to +18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, are those for CI or for production?

Copy link
Contributor Author

@rbaturov rbaturov Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auth_proxy_role.yaml, auth_proxy_service.yamlandauth_proxy_role_binding.yaml` are mandatory for the sidecar operation. Meaning these three needed for CI (for curl tests) but also for production.
without them sidecar won't be ready.
The service deployment is mandatory as its annotation responsible for creating a secret consumed as a volume by the sidecar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the auth_proxy_client_clusterrole.yaml is not being used at all

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so let's not add auth_proxy_client_clusterrole.yaml or unnecessary stuff in general

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auth_proxy_role.yaml, auth_proxy_service.yamlandauth_proxy_role_binding.yaml` are mandatory for the sidecar operation. Meaning these three needed for CI (for curl tests) but also for production. without them sidecar won't be ready. The service deployment is mandatory as its annotation responsible for creating a secret consumed as a volume by the sidecar.

Could you elaborate about "not be ready"? I'd expect the sidecars to be up and running, but not be accessible by anyone in the cluster without these RBAC rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the service for example won't be created, the secret that we mount as a volume won't be created. Therefore, this would result with an error and the sidecar state won't be ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks. This is a possible problem for the backports, because makes them more invasive than expected.

10 changes: 6 additions & 4 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,11 @@ const (
)

const (
defaultWebhookPort = 9443
defaultMetricsAddr = ":8080"
defaultProbeAddr = ":8081"
defaultNamespace = "numaresources-operator"
defaultWebhookPort = 9443
defaultMetricsAddr = ":8080"
defaultMetricsSupport = true
defaultProbeAddr = ":8081"
defaultNamespace = "numaresources-operator"
)

var (
Expand Down Expand Up @@ -129,6 +130,7 @@ func (pa *Params) SetDefaults() {
pa.probeAddr = defaultProbeAddr
pa.render.Namespace = defaultNamespace
pa.enableReplicasDetect = true
pa.enableMetrics = defaultMetricsSupport
}

func (pa *Params) FromFlags() {
Expand Down