Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSSM-3374 Add servicemesh_members metric #1119

Open
wants to merge 18 commits into
base: maistra-2.4
Choose a base branch
from

Conversation

bmangoen
Copy link
Contributor

We want to add the Service Mesh Control Plane Namespaces counter metric that would be exposed by Telemetry

@jewertow
Copy link
Member

@bmangoen could you rebase this PR?

@dgn
Copy link
Contributor

dgn commented Mar 28, 2023

how best to test this? it looks like it's integrated with cluster-monitoring, but I don't know how to verify that

Copy link

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to implement a Collector interface which is less error-prone from my experience than setting/deleting the metric on the fly.
Here is an example of such implementation.

@bmangoen
Copy link
Contributor Author

how best to test this? it looks like it's integrated with cluster-monitoring, but I don't know how to verify that

@dgn You can check these metrics from the OpenShift web console Observe > Metrics but you first need to label the operator namespace oc label ns openshift-operators openshift.io/cluster-monitoring=true (it will be done later at the operator install via OLM)

@bmangoen
Copy link
Contributor Author

bmangoen commented Apr 5, 2023

/retest

@luksa
Copy link
Contributor

luksa commented Apr 17, 2023

@bmangoen is this now ready to be merged or does it need more work? is the metric name now correct?

@bmangoen
Copy link
Contributor Author

@luksa It is ready to be merged IMO. I renamed the metric service_member_count as recommended servicemesh_members.

@rcernich
Copy link
Contributor

@bmangoen , it looks like this needs to be rebased

@jewertow
Copy link
Member

jewertow commented Dec 9, 2023

I would like to have metrics, which will answer to the following questions:

  1. How many SMCPs of a particular version are deployed.
  2. How many tenants our customers have per cluster.
  3. Average, maximum and total number of namespaces included in a mesh.

Current metrics do not provide enough information and we can only conclude which SMCP versions are still used, but not how many of a particular version and how many tenants users have per cluster, because metrics are summed up. Additionally, SMCP version is irrelevant in the context of namespaces included in the mesh, so this will create unnecessary time series.

I suggest to collect the following metrics:

  1. servicemesh_controlplanes{version,mode,namespace} - we will not export the namespace, but it's necessary to create unique metrics for each SMCP controller.
  2. servicemesh_members{mode,namespace} - namespace will not be exported but it's necessary to create a unique metrics for each tenant.

And then we have to create the following PrometheusRule:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: service-mesh-2
  namespace: openshift-operators
spec:
  groups:
  - name: service-mesh-2-control-planes
    rules:
    - expr: |-
        sum by (version,mode) (servicemesh_controlplanes)
      record: 'cluster:servicemesh_controlplanes:sum'
  - name: service-mesh-2-members
    rules:
    - expr: |-
        avg by (mode) (servicemesh_members)
      record: 'cluster:servicemesh_members:avg'
     - expr: |-
        max by (mode) (servicemesh_members)
      record: 'cluster:servicemesh_members:max'
     - expr: |-
        sum by (mode) (servicemesh_members)
      record: 'cluster:servicemesh_members:sum'

Copy link

openshift-ci bot commented Jun 12, 2024

@bmangoen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 1a85505 link true /test unit
ci/prow/lint 1a85505 link true /test lint
ci/prow/gencheck 1a85505 link true /test gencheck

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bmangoen
Copy link
Contributor Author

bmangoen commented Aug 9, 2024

I would like to have metrics, which will answer to the following questions:

  1. How many SMCPs of a particular version are deployed.
  2. How many tenants our customers have per cluster.
  3. Average, maximum and total number of namespaces included in a mesh.

Current metrics do not provide enough information and we can only conclude which SMCP versions are still used, but not how many of a particular version and how many tenants users have per cluster, because metrics are summed up. Additionally, SMCP version is irrelevant in the context of namespaces included in the mesh, so this will create unnecessary time series.

I suggest to collect the following metrics:

  1. servicemesh_controlplanes{version,mode,namespace} - we will not export the namespace, but it's necessary to create unique metrics for each SMCP controller.
  2. servicemesh_members{mode,namespace} - namespace will not be exported but it's necessary to create a unique metrics for each tenant.

And then we have to create the following PrometheusRule:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: service-mesh-2
  namespace: openshift-operators
spec:
  groups:
  - name: service-mesh-2-control-planes
    rules:
    - expr: |-
        sum by (version,mode) (servicemesh_controlplanes)
      record: 'cluster:servicemesh_controlplanes:sum'
  - name: service-mesh-2-members
    rules:
    - expr: |-
        avg by (mode) (servicemesh_members)
      record: 'cluster:servicemesh_members:avg'
     - expr: |-
        max by (mode) (servicemesh_members)
      record: 'cluster:servicemesh_members:max'
     - expr: |-
        sum by (mode) (servicemesh_members)
      record: 'cluster:servicemesh_members:sum'

@jewertow Does this still make sense to implement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants