Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods metrics endpoint #829

Closed
SSvilen opened this issue Nov 25, 2021 · 21 comments
Closed

Pods metrics endpoint #829

SSvilen opened this issue Nov 25, 2021 · 21 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@SSvilen
Copy link

SSvilen commented Nov 25, 2021

By default the pod metrics are getting scraped from the following kubelet endpoint:
/metrics/cadvisor

Which is more or less empty on windows machines.
I could workaroung that problem by creating new Service Monitor, which scrapes the mertics from the '/metrics/resource' endpoint

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: kubelet-windows
    k8s-app: kubelet-windows
  name: kubelet-windows
  namespace: openshift-monitoring
spec:
  endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      bearerTokenSecret:
        key: ""
      honorLabels: true
      honorTimestamps: false
      interval: 30s
      path: /metrics/resource
      port: https-metrics
      relabelings:
        - sourceLabels:
            - __metrics_path__
          replacement: "/metrics/cadvisor"
          targetLabel: metrics_path
        - replacement: dummy
          targetLabel: image
      scheme: https
      scrapeTimeout: 30s
      tlsConfig:
        ca: {}
        caFile: /etc/prometheus/configmaps/kubelet-serving-ca-bundle/ca-bundle.crt
        cert: {}
  jobLabel: job
  targetLabels:
    - job
    - metrics_path
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      k8s-app: kubelet-windows

Now the CPU and memory consumption are visible in OpenShift Console.
Or is there another (built-in) way to achive that?

@aravindhp
Copy link
Contributor

@SSvilen please provide your cluster details. Is this the on the same cluster? What does oc adm top pod return? Please provide a console screenshot.

@SSvilen
Copy link
Author

SSvilen commented Nov 26, 2021

@aravindhp ,

sorry, I believe I didn't describe the problem correctly. The top alway worked, because of the metrics server.
The problem is the representation in the console, more specifically the deployment metrics dashboard.
image

For the pod CPU metrics I modified the cluster-monitoring-operator-prometheus-rules Prometheus rule to the following:

 - expr: sum(rate(container_cpu_usage_seconds_total{container="",pod!=""}[5m])
        or rate(pod_cpu_usage_seconds_total{pod!=""}[5m])) BY (pod, namespace)

Now I have some metrics in the Console:
image

For the memory it seems not that trivial, because under Windows the metric for the whole pod is called:

pod_memory_working_set_bytes

So the problem is not critical, but it would be nice if at some point the same kind of features are available for both Linux und Windows Pods.

@mansikulkarni96
Copy link
Member

@SSvilen thanks for opening the issue.
You are right, the pod metrics for Linux come from cAdvisor. However, we do not get same metrics for Windows, hence we plan to get them from the Windows exporter running on the node. This is something that we have in our pipeline and we plan to add support for displaying pod graphs in console soon.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2022
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 29, 2022
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Apr 28, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 28, 2022

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@MattPOlson
Copy link

This still seems to be an issue, is there a plan to fix this in a future release? I was able to get metrics by creating the new service monitor as describe above and creating the following two new rules in windows-prometheus-k8s-rules PrometheusRule instance

    - expr: >-
        sum(rate(container_cpu_usage_seconds_total{pod!=""}[5m])) BY (pod,
        namespace)
      record: 'pod:container_cpu_usage:sum'
    - expr: |
        pod_memory_working_set_bytes
      record: container_memory_working_set_bytes

@mansikulkarni96
Copy link
Member

@MattPOlson yes there is a plan to add support for pod metrics it is being tracked in JIRA here https://issues.redhat.com/browse/WINC-568

@sebsoto
Copy link
Contributor

sebsoto commented Aug 3, 2022

/reopen
So we can track this on Github as well

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 3, 2022

@sebsoto: Reopened this issue.

In response to this:

/reopen
So we can track this on Github as well

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot reopened this Aug 3, 2022
@sebsoto
Copy link
Contributor

sebsoto commented Aug 3, 2022

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 3, 2022
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 2, 2022
@sebsoto
Copy link
Contributor

sebsoto commented Nov 2, 2022

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 2, 2022
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2023
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 3, 2023
@alinaryan
Copy link
Contributor

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 13, 2023
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2023
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 12, 2023
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Aug 12, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 12, 2023

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants