You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the operator annotates Redis and Sentinel pods with the following set of annotations and expects the maintainer of the operator to write a Prometheus scrape config to use these annotations for finding the pods and configuring scraping.
Writing a scrape config is tedious and low-level, so most people prefer the high-level PodMonitor and ServiceMonitor alternatives. Also, sometimes the operator maintainer doesn't have access to the Prometheus configuration (even through additional scrape configs), for example, when they are delegating Prometheus setup and maintenance to another team inside or outside their organization.
The alternatives
We can use PodMonitors and ServiceMonitors instead. But there are a number of ways we can use them and we need to decide which one is best for the target users of this operator.
We can create a single PodMonitor/ServiceMonitor that selects all Redis and Sentinel pods of all the RedisFailover instances, but it limits us in customizing the scraping configuration for each RedisFailover instance (maybe some need to be scraped every second, while 1-minute interval suffices for the rest).
We can make it more configurable by having one PodMonitor per RedisFailover. It fixes the above problem, but in most cases, Sentinels don't need to be scraped as often as Redises.
Creating one PodMonitor for Sentinels and one for Redises per RedisFailover fixes the above problem. But maybe we need to scrape Redis master pod more often.
And finally, we can have three PodMonitors per RedisFailover: one for Redis master, one for Redis slaves, and one for Sentinels. The problem with this solution is that it makes the cluster very crowded, especially if, due to an organization standard, all Redis instances must be created in a single designated Kubernetes namespace.
My suggestion
I believe the best course of action is to start with the simplest alternative and then let the community experiment with their ideas and discuss how to mature this feature.
At first sight, the simplest alternative seems to be the singular PodMonitor for all RedisFailovers, but it prevents the community to experience and experiment with customizing scraping. We all have critical RedisFailovers that we don't want to touch unless absolutely necessary, and not-so-important RedisFailovers that are victims to all our experiments, including the playful ones. If we have a single PodMonitor for all RedisFailovers, we won't experiment on it because it will also affect the critical ones. So, in order to let the community run all kinds of experiments, we need to have separate PodMonitor(s) for each RedisFailover.
If we decide to have a single PodMonitor per RedisFailover, we will face a number of problems in the RedisFailover CRD. To make it easier to follow, I rewrite the RefisFailovers spec struct and some of its composing structs here.
In order to have a single PodMonitor for the entire RedisFailover, we will have to add a PodMonitor field to RedisFailoverSpec, since it belongs to the entire RedisFailover not just the Redis or Sentinel part. The sole function of this PodMonitor will be to connect Prometheus with the endpoints created by the exporters of Redis and Sentinel.
Both these exporters can be disabled, in which case, the creation of the PodMonitor CR is futile and possibly confusing. Also, if you enable only one of them, PodMonitor will only be able to connect that one to Prometheus and it won't be clear which one for the maintainer unless they check the RedisFailover or the pod containers, again a little confusing.
Also, if you disable PodMonitor and enable both Exporters, they won't be connected to Prometheus. So, since the functionality of the PodMonitor and Exporters are so intertwined, from a design standpoint, it makes more sense to put them closer to each other and preferably let them share a few subfields, for example, Enabled.
The easiest way to get rid of this poor design and confusion is by having one PodMonitor for Redises and one for Sentinels, adding a PodMonitor subfield to Exporter, and letting the Exporter's Enabled field control the creation of PodMonitor.
Issues #383 and #494 have discussed the same problem but didn't reach a resolution. I tried to consider those discussions in writing this issue and take a step forward toward resolution.
I also have implemented and tested on production my suggested design in #624 which is a rather large PR because the operator isn't designed for frequent such changes and requires change all over the codebase for adding a CR from a new GKV(GroupKindVersion). But this PR paves the way for adding more Prometheus-related configs to RedisFailover CRD, for example, custom PrometheusRules for each RedisFailover.
The issue with the current Prometheus support
Currently, the operator annotates Redis and Sentinel pods with the following set of annotations and expects the maintainer of the operator to write a Prometheus scrape config to use these annotations for finding the pods and configuring scraping.
Writing a scrape config is tedious and low-level, so most people prefer the high-level
PodMonitor
andServiceMonitor
alternatives. Also, sometimes the operator maintainer doesn't have access to the Prometheus configuration (even throughadditional scrape configs
), for example, when they are delegating Prometheus setup and maintenance to another team inside or outside their organization.The alternatives
We can use
PodMonitor
s andServiceMonitor
s instead. But there are a number of ways we can use them and we need to decide which one is best for the target users of this operator.We can create a single
PodMonitor
/ServiceMonitor
that selects all Redis and Sentinel pods of all theRedisFailover
instances, but it limits us in customizing the scraping configuration for eachRedisFailover
instance (maybe some need to be scraped every second, while 1-minute interval suffices for the rest).We can make it more configurable by having one
PodMonitor
perRedisFailover
. It fixes the above problem, but in most cases, Sentinels don't need to be scraped as often as Redises.Creating one
PodMonitor
for Sentinels and one for Redises perRedisFailover
fixes the above problem. But maybe we need to scrape Redis master pod more often.And finally, we can have three
PodMonitor
s perRedisFailover
: one for Redis master, one for Redis slaves, and one for Sentinels. The problem with this solution is that it makes the cluster very crowded, especially if, due to an organization standard, all Redis instances must be created in a single designated Kubernetes namespace.My suggestion
I believe the best course of action is to start with the simplest alternative and then let the community experiment with their ideas and discuss how to mature this feature.
At first sight, the simplest alternative seems to be the singular
PodMonitor
for allRedisFailover
s, but it prevents the community to experience and experiment with customizing scraping. We all have criticalRedisFailover
s that we don't want to touch unless absolutely necessary, and not-so-importantRedisFailover
s that are victims to all our experiments, including the playful ones. If we have a singlePodMonitor
for allRedisFailover
s, we won't experiment on it because it will also affect the critical ones. So, in order to let the community run all kinds of experiments, we need to have separatePodMonitor
(s) for eachRedisFailover
.If we decide to have a single
PodMonitor
perRedisFailover
, we will face a number of problems in theRedisFailover
CRD. To make it easier to follow, I rewrite theRefisFailover
s spec struct and some of its composing structs here.In order to have a single
PodMonitor
for the entireRedisFailover
, we will have to add aPodMonitor
field toRedisFailoverSpec
, since it belongs to the entireRedisFailover
not just theRedis
orSentinel
part. The sole function of thisPodMonitor
will be to connect Prometheus with the endpoints created by the exporters of Redis and Sentinel.Both these exporters can be disabled, in which case, the creation of the
PodMonitor
CR is futile and possibly confusing. Also, if you enable only one of them,PodMonitor
will only be able to connect that one to Prometheus and it won't be clear which one for the maintainer unless they check theRedisFailover
or the pod containers, again a little confusing.Also, if you disable
PodMonitor
and enable bothExporter
s, they won't be connected to Prometheus. So, since the functionality of thePodMonitor
andExporter
s are so intertwined, from a design standpoint, it makes more sense to put them closer to each other and preferably let them share a few subfields, for example,Enabled
.The easiest way to get rid of this poor design and confusion is by having one
PodMonitor
for Redises and one for Sentinels, adding aPodMonitor
subfield toExporter
, and letting theExporter
'sEnabled
field control the creation ofPodMonitor
.The text was updated successfully, but these errors were encountered: