Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(example,metrics): kube-state-metrics to monitor custom resource … #10277

Closed

Conversation

sebastiangaiser
Copy link
Contributor

@sebastiangaiser sebastiangaiser commented Jun 28, 2024

…state

In order to monitor the state of custom resources (CR) inside of the Kubernetes cluster, kube-state-metrics can be deployed. This describes the deployment using the prometheus-community Helm chart.

Issue: #10276

Type of change

Select the type of your PR

  • Enhancement / new feature

Description

In order to monitor the state of custom resources (CR) inside of the Kubernetes cluster, kube-state-metrics can be deployed. This describes the deployment using the prometheus-community Helm chart.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

…state

In order to monitor the state of custom resources (CR) inside of the Kubernetes cluster, kube-state-metrics can be deployed. This describes the deployment using the prometheus-community Helm chart.

Issue: strimzi#10276
Signed-off-by: Sebastian Gaiser <[email protected]>
@scholzj scholzj linked an issue Jun 28, 2024 that may be closed by this pull request
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I think this is a good idea. I think we should consider some additional things here:

  • Do we want to document this? (I guess we have some notes on what is int he examples, so we should mention it?)
  • Do we want to have some no-Helm variant of this in the examples? As we in general have a pure YAML based installation as the primary source and as many users do not use Helm, so this useful only for some users.
  • Should we provide this for all custom resources and deprecated (and later remove) the state metrics provided by Strimzi CO? (this might not be a task for this PR, but it should be considered when adopting this)
  • Do we want to have some System Test coverage?

CC @strimzi/maintainers

@sebastiangaiser
Copy link
Contributor Author

Should we provide this for all custom resources

I think this would be a good idea. E.g. reconcile for a Kafka resource fails, a KafkaRebalance has problems, ... But I didn't had any problems, yet. Neither I'm not sure which fields are relevant for all resources.

Do we want to have some no-Helm variant of this in the examples?

I think this depends on how you would like to deploy the monitoring. The Flux example uses the kube-prometheus-stack and injects the values there (into the Helm release). Personally I'm not a big fan of mixing my monitoring stack with some Flux/Strimzi/... specific monitoring. So I ended up deploying a second ksm for Flux and Strimzi (yes I use Flux).

@scholzj
Copy link
Member

scholzj commented Jun 29, 2024

I think this depends on how you would like to deploy the monitoring. The Flux example uses the kube-prometheus-stack and injects the values there (into the Helm release). Personally I'm not a big fan of mixing my monitoring stack with some Flux/Strimzi/... specific monitoring. So I ended up deploying a second ksm for Flux and Strimzi (yes I use Flux).

Well, the Prometheus part seems to be just a custom resource that can have its own YAMl as well. I also assume that the confguration will end up in some ConfigMap or some environment variables to configure the Kube State Metrics? So that can be described or stored in separate YAMLs.

I think this would be a good idea. E.g. reconcile for a Kafka resource fails, a KafkaRebalance has problems, ... But I didn't had any problems, yet. Neither I'm not sure which fields are relevant for all resources.

I guess a start might be to replicate what the Cluster operator does -> a metric to indicate if the resource is ready or not. That would allow us to drop it from the Cluster operator. We can improve things later as some ideas pop-out. But we can wait for this for some discussion and have others chime in with their thoughts.

@ppatierno
Copy link
Member

Following my thoughts ... and I am not saying I am against it but just thinking aloud:

  • using kube-state-metrics add one more component to the overall deployment. Something we don't need today.
  • why is it better deploying one more component to expose metrics when they are already exposed by the operator? Because we see kube-state-metrics more flexible when it comes to add new metrics (change its configuration, no change in the operator code).
  • for sure we would need a way to deploy it without Helm Charts.
  • for backward compatibility the provided configuration should export the same metrics that we have today in the operator. No less. We could break users' systems using the removed ones. We can add more metrics in the future.
  • we should think about how the upgrade works, when you move from a Strimzi version exposing these metrics out of box to a newer version which needs kube-state-metrics to be deployed.

@scholzj
Copy link
Member

scholzj commented Jul 2, 2024

@ppatierno Keep in mind that this is not removing anything at this point. It just adds metrics similar to those we never implemented in UO and TO.

@scholzj
Copy link
Member

scholzj commented Sep 5, 2024

Discussed on the community call on 5.9.2024: This PR should be closed and the discussion should continue on the #10276 issue and the related proposal. Thanks for opening this topic @sebastiangaiser

@scholzj scholzj closed this Sep 5, 2024
@sebastiangaiser sebastiangaiser deleted the feature/cr-monitoring branch September 5, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement]: Monitoring of custom resources
3 participants