feat(example,metrics): kube-state-metrics to monitor custom resource … #10277

sebastiangaiser · 2024-06-28T12:50:34Z

…state

In order to monitor the state of custom resources (CR) inside of the Kubernetes cluster, kube-state-metrics can be deployed. This describes the deployment using the prometheus-community Helm chart.

Issue: #10276

Type of change

Select the type of your PR

Enhancement / new feature

Description

In order to monitor the state of custom resources (CR) inside of the Kubernetes cluster, kube-state-metrics can be deployed. This describes the deployment using the prometheus-community Helm chart.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

Write tests
Make sure all tests pass
Update documentation
Check RBAC rights for Kubernetes / OpenShift roles
Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
Reference relevant issue(s) and close them after merging
Update CHANGELOG.md
Supply screenshots for visual changes, such as Grafana dashboards

…state In order to monitor the state of custom resources (CR) inside of the Kubernetes cluster, kube-state-metrics can be deployed. This describes the deployment using the prometheus-community Helm chart. Issue: strimzi#10276 Signed-off-by: Sebastian Gaiser <[email protected]>

scholzj

Thanks for the PR. I think this is a good idea. I think we should consider some additional things here:

Do we want to document this? (I guess we have some notes on what is int he examples, so we should mention it?)
Do we want to have some no-Helm variant of this in the examples? As we in general have a pure YAML based installation as the primary source and as many users do not use Helm, so this useful only for some users.
Should we provide this for all custom resources and deprecated (and later remove) the state metrics provided by Strimzi CO? (this might not be a task for this PR, but it should be considered when adopting this)
Do we want to have some System Test coverage?

CC @strimzi/maintainers

sebastiangaiser · 2024-06-29T09:20:56Z

Should we provide this for all custom resources

I think this would be a good idea. E.g. reconcile for a Kafka resource fails, a KafkaRebalance has problems, ... But I didn't had any problems, yet. Neither I'm not sure which fields are relevant for all resources.

Do we want to have some no-Helm variant of this in the examples?

I think this depends on how you would like to deploy the monitoring. The Flux example uses the kube-prometheus-stack and injects the values there (into the Helm release). Personally I'm not a big fan of mixing my monitoring stack with some Flux/Strimzi/... specific monitoring. So I ended up deploying a second ksm for Flux and Strimzi (yes I use Flux).

scholzj · 2024-06-29T15:34:39Z

I think this depends on how you would like to deploy the monitoring. The Flux example uses the kube-prometheus-stack and injects the values there (into the Helm release). Personally I'm not a big fan of mixing my monitoring stack with some Flux/Strimzi/... specific monitoring. So I ended up deploying a second ksm for Flux and Strimzi (yes I use Flux).

Well, the Prometheus part seems to be just a custom resource that can have its own YAMl as well. I also assume that the confguration will end up in some ConfigMap or some environment variables to configure the Kube State Metrics? So that can be described or stored in separate YAMLs.

I think this would be a good idea. E.g. reconcile for a Kafka resource fails, a KafkaRebalance has problems, ... But I didn't had any problems, yet. Neither I'm not sure which fields are relevant for all resources.

I guess a start might be to replicate what the Cluster operator does -> a metric to indicate if the resource is ready or not. That would allow us to drop it from the Cluster operator. We can improve things later as some ideas pop-out. But we can wait for this for some discussion and have others chime in with their thoughts.

ppatierno · 2024-07-02T07:58:48Z

Following my thoughts ... and I am not saying I am against it but just thinking aloud:

using kube-state-metrics add one more component to the overall deployment. Something we don't need today.
why is it better deploying one more component to expose metrics when they are already exposed by the operator? Because we see kube-state-metrics more flexible when it comes to add new metrics (change its configuration, no change in the operator code).
for sure we would need a way to deploy it without Helm Charts.
for backward compatibility the provided configuration should export the same metrics that we have today in the operator. No less. We could break users' systems using the removed ones. We can add more metrics in the future.
we should think about how the upgrade works, when you move from a Strimzi version exposing these metrics out of box to a newer version which needs kube-state-metrics to be deployed.

scholzj · 2024-07-02T08:23:39Z

@ppatierno Keep in mind that this is not removing anything at this point. It just adds metrics similar to those we never implemented in UO and TO.

scholzj · 2024-09-05T16:12:59Z

Discussed on the community call on 5.9.2024: This PR should be closed and the discussion should continue on the #10276 issue and the related proposal. Thanks for opening this topic @sebastiangaiser

scholzj linked an issue Jun 28, 2024 that may be closed by this pull request

[Enhancement]: Monitoring of custom resources #10276

Open

scholzj reviewed Jun 28, 2024

View reviewed changes

im-konge mentioned this pull request Aug 8, 2024

[Enhancement]: Monitoring of custom resources #10276

Open

scholzj closed this Sep 5, 2024

sebastiangaiser deleted the feature/cr-monitoring branch September 5, 2024 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(example,metrics): kube-state-metrics to monitor custom resource … #10277

feat(example,metrics): kube-state-metrics to monitor custom resource … #10277

sebastiangaiser commented Jun 28, 2024 •

edited

Loading

scholzj left a comment

sebastiangaiser commented Jun 29, 2024

scholzj commented Jun 29, 2024

ppatierno commented Jul 2, 2024

scholzj commented Jul 2, 2024

scholzj commented Sep 5, 2024

feat(example,metrics): kube-state-metrics to monitor custom resource … #10277

feat(example,metrics): kube-state-metrics to monitor custom resource … #10277

Conversation

sebastiangaiser commented Jun 28, 2024 • edited Loading

Type of change

Description

Checklist

scholzj left a comment

Choose a reason for hiding this comment

sebastiangaiser commented Jun 29, 2024

scholzj commented Jun 29, 2024

ppatierno commented Jul 2, 2024

scholzj commented Jul 2, 2024

scholzj commented Sep 5, 2024

sebastiangaiser commented Jun 28, 2024 •

edited

Loading