Pods metrics endpoint #829

SSvilen · 2021-11-25T12:34:30Z

By default the pod metrics are getting scraped from the following kubelet endpoint:
/metrics/cadvisor

Which is more or less empty on windows machines.
I could workaroung that problem by creating new Service Monitor, which scrapes the mertics from the '/metrics/resource' endpoint

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: kubelet-windows
    k8s-app: kubelet-windows
  name: kubelet-windows
  namespace: openshift-monitoring
spec:
  endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      bearerTokenSecret:
        key: ""
      honorLabels: true
      honorTimestamps: false
      interval: 30s
      path: /metrics/resource
      port: https-metrics
      relabelings:
        - sourceLabels:
            - __metrics_path__
          replacement: "/metrics/cadvisor"
          targetLabel: metrics_path
        - replacement: dummy
          targetLabel: image
      scheme: https
      scrapeTimeout: 30s
      tlsConfig:
        ca: {}
        caFile: /etc/prometheus/configmaps/kubelet-serving-ca-bundle/ca-bundle.crt
        cert: {}
  jobLabel: job
  targetLabels:
    - job
    - metrics_path
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      k8s-app: kubelet-windows

Now the CPU and memory consumption are visible in OpenShift Console.
Or is there another (built-in) way to achive that?

aravindhp · 2021-11-25T23:42:11Z

@SSvilen please provide your cluster details. Is this the on the same cluster? What does oc adm top pod return? Please provide a console screenshot.

SSvilen · 2021-11-26T13:33:53Z

@aravindhp ,

sorry, I believe I didn't describe the problem correctly. The top alway worked, because of the metrics server.
The problem is the representation in the console, more specifically the deployment metrics dashboard.

For the pod CPU metrics I modified the cluster-monitoring-operator-prometheus-rules Prometheus rule to the following:

 - expr: sum(rate(container_cpu_usage_seconds_total{container="",pod!=""}[5m])
        or rate(pod_cpu_usage_seconds_total{pod!=""}[5m])) BY (pod, namespace)

Now I have some metrics in the Console:

For the memory it seems not that trivial, because under Windows the metric for the whole pod is called:

pod_memory_working_set_bytes

So the problem is not critical, but it would be nice if at some point the same kind of features are available for both Linux und Windows Pods.

mansikulkarni96 · 2021-11-29T17:07:31Z

@SSvilen thanks for opening the issue.
You are right, the pod metrics for Linux come from cAdvisor. However, we do not get same metrics for Windows, hence we plan to get them from the Windows exporter running on the node. This is something that we have in our pipeline and we plan to add support for displaying pod graphs in console soon.

openshift-bot · 2022-02-27T17:09:11Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2022-03-29T17:32:26Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2022-04-28T18:05:05Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2022-04-28T18:05:39Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

MattPOlson · 2022-08-01T13:17:50Z

This still seems to be an issue, is there a plan to fix this in a future release? I was able to get metrics by creating the new service monitor as describe above and creating the following two new rules in windows-prometheus-k8s-rules PrometheusRule instance

    - expr: >-
        sum(rate(container_cpu_usage_seconds_total{pod!=""}[5m])) BY (pod,
        namespace)
      record: 'pod:container_cpu_usage:sum'
    - expr: |
        pod_memory_working_set_bytes
      record: container_memory_working_set_bytes

mansikulkarni96 · 2022-08-01T13:45:58Z

@MattPOlson yes there is a plan to add support for pod metrics it is being tracked in JIRA here https://issues.redhat.com/browse/WINC-568

sebsoto · 2022-08-03T18:42:43Z

/reopen
So we can track this on Github as well

openshift-ci · 2022-08-03T18:43:09Z

@sebsoto: Reopened this issue.

In response to this:

/reopen
So we can track this on Github as well

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sebsoto · 2022-08-03T18:43:22Z

/remove-lifecycle rotten

openshift-bot · 2022-11-02T01:00:43Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

sebsoto · 2022-11-02T12:08:56Z

/remove-lifecycle stale

openshift-bot · 2023-02-01T01:00:49Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2023-03-03T08:30:50Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

alinaryan · 2023-03-13T16:51:55Z

/remove-lifecycle rotten

openshift-bot · 2023-06-12T01:00:34Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2023-07-12T08:31:10Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2023-08-12T00:00:30Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2023-08-12T00:02:15Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2022

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 29, 2022

openshift-ci bot closed this as completed Apr 28, 2022

openshift-ci bot reopened this Aug 3, 2022

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 3, 2022

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 2, 2022

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 2, 2022

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 3, 2023

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 13, 2023

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 12, 2023

openshift-ci bot closed this as completed Aug 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods metrics endpoint #829

Pods metrics endpoint #829

SSvilen commented Nov 25, 2021 •

edited

Loading

aravindhp commented Nov 25, 2021

SSvilen commented Nov 26, 2021

mansikulkarni96 commented Nov 29, 2021

openshift-bot commented Feb 27, 2022

openshift-bot commented Mar 29, 2022

openshift-bot commented Apr 28, 2022

openshift-ci bot commented Apr 28, 2022

MattPOlson commented Aug 1, 2022

mansikulkarni96 commented Aug 1, 2022

sebsoto commented Aug 3, 2022

openshift-ci bot commented Aug 3, 2022

sebsoto commented Aug 3, 2022

openshift-bot commented Nov 2, 2022

sebsoto commented Nov 2, 2022

openshift-bot commented Feb 1, 2023

openshift-bot commented Mar 3, 2023

alinaryan commented Mar 13, 2023

openshift-bot commented Jun 12, 2023

openshift-bot commented Jul 12, 2023

openshift-bot commented Aug 12, 2023

openshift-ci bot commented Aug 12, 2023

Pods metrics endpoint #829

Pods metrics endpoint #829

Comments

SSvilen commented Nov 25, 2021 • edited Loading

aravindhp commented Nov 25, 2021

SSvilen commented Nov 26, 2021

mansikulkarni96 commented Nov 29, 2021

openshift-bot commented Feb 27, 2022

openshift-bot commented Mar 29, 2022

openshift-bot commented Apr 28, 2022

openshift-ci bot commented Apr 28, 2022

MattPOlson commented Aug 1, 2022

mansikulkarni96 commented Aug 1, 2022

sebsoto commented Aug 3, 2022

openshift-ci bot commented Aug 3, 2022

sebsoto commented Aug 3, 2022

openshift-bot commented Nov 2, 2022

sebsoto commented Nov 2, 2022

openshift-bot commented Feb 1, 2023

openshift-bot commented Mar 3, 2023

alinaryan commented Mar 13, 2023

openshift-bot commented Jun 12, 2023

openshift-bot commented Jul 12, 2023

openshift-bot commented Aug 12, 2023

openshift-ci bot commented Aug 12, 2023

SSvilen commented Nov 25, 2021 •

edited

Loading