Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pod metrics in Lens for Prometheus as they display incorrect value which 2 x times bigger then actual value #7679

Open
dragoangel opened this issue May 4, 2023 · 19 comments
Labels
area/metrics All the things related to metrics enhancement New feature or request

Comments

@dragoangel
Copy link

What would you like to be added:
Could you add an option that will allow passing custom query parameters for metrics requests?

Why is this needed:
This is required for configuring aspects of monitoring, for example, pass timeout parameter to Prometheus. Also, not having the option to set query parameters when Lens is pointed to solutions like Thanos and Prometheus HA leading to displaying metrics wrongly. Lens will show duplicated data from HA, f.e.: pod CPU and RAM usage will be multiplied by the count of replicas in HA. To display data correctly and not fail on partial_response: dedup=1&partial_response=1 would help, but Lens does not accept query parameters in PROMETHEUS SERVICE ADDRESS unfortunately and does not have a separate field to add them.

Environment you are Lens application on:

  • Kubernetes distribution: Rancher
  • Metrics Prometheus type: Prometheus Operator
  • Metrics Prometheus service: Thanos Query Frontend that query Prometheus in HA
  • Desktop OS: Any
@dragoangel dragoangel added the enhancement New feature or request label May 4, 2023
@dragoangel
Copy link
Author

dragoangel commented May 4, 2023

Hm, after testing actually I found that reason of multiplied CPU and ram usage on pod compared of sum of containers not related to thanos usage and query params, because from what I tested:

  1. thanos deduplicate data by default
  2. on another setup without sharding and thanos issue reproduce in same way.

So this a bug. Will try debug tomorrow per query what is wrong

import type { PrometheusProvider } from "./provider";

@dragoangel
Copy link
Author

dragoangel commented May 5, 2023

I found issue at

case "pods":
switch (queryName) {
case "cpuUsage":
return `sum(rate(container_cpu_usage_seconds_total{pod=~"${opts.pods}", namespace="${opts.namespace}"}[${rateAccuracy}])) by (${opts.selector})`;
case "cpuRequests":
return `sum(kube_pod_container_resource_requests{pod=~"${opts.pods}", resource="cpu", namespace="${opts.namespace}"}) by (${opts.selector})`;
case "cpuLimits":
return `sum(kube_pod_container_resource_limits{pod=~"${opts.pods}", resource="cpu", namespace="${opts.namespace}"}) by (${opts.selector})`;
case "memoryUsage":
return `sum(container_memory_working_set_bytes{pod=~"${opts.pods}", namespace="${opts.namespace}"}) by (${opts.selector})`;
case "memoryRequests":
return `sum(kube_pod_container_resource_requests{pod=~"${opts.pods}", resource="memory", namespace="${opts.namespace}"}) by (${opts.selector})`;
case "memoryLimits":
return `sum(kube_pod_container_resource_limits{pod=~"${opts.pods}", resource="memory", namespace="${opts.namespace}"}) by (${opts.selector})`;
case "fsUsage":
return `sum(container_fs_usage_bytes{pod=~"${opts.pods}", namespace="${opts.namespace}"}) by (${opts.selector})`;
case "fsWrites":
return `sum(rate(container_fs_writes_bytes_total{pod=~"${opts.pods}", namespace="${opts.namespace}"}[${rateAccuracy}])) by (${opts.selector})`;
case "fsReads":
return `sum(rate(container_fs_reads_bytes_total{pod=~"${opts.pods}", namespace="${opts.namespace}"}[${rateAccuracy}])) by (${opts.selector})`;
case "networkReceive":
return `sum(rate(container_network_receive_bytes_total{pod=~"${opts.pods}", namespace="${opts.namespace}"}[${rateAccuracy}])) by (${opts.selector})`;
case "networkTransmit":
return `sum(rate(container_network_transmit_bytes_total{pod=~"${opts.pods}", namespace="${opts.namespace}"}[${rateAccuracy}])) by (${opts.selector})`;
}

Need to add container!=""

@Nokel81 sorry for bothering you, but can you please check this? Thank you in advance.

@Nokel81
Copy link
Collaborator

Nokel81 commented May 5, 2023

I think we need to add a toggle for that, because we have added and then removed such a filter several times.

@Nokel81 Nokel81 added the area/metrics All the things related to metrics label May 5, 2023
@dragoangel
Copy link
Author

dragoangel commented May 5, 2023

@Nokel81 you mean that previously there was such filter like sum(kube_pod_container_resource_limits{container!="", pod=~"${opts.pods}", resource="cpu", namespace="${opts.namespace}"}) by (${opts.selector}) but it was removed due to some issue?

P.s. in general adding the option to add custom query params would be nice feature, even that not as critical as I think about it initially when was creating this issue 😊

@dragoangel dragoangel changed the title Allow adding Custom query parameters to Metrics service Fix pod metrics in Lens for Prometheus as they display incorrect value which 2 x times bigger then actual value May 5, 2023
@oleksandr-selezniov
Copy link

Hi. I'm observing doubled metric plots for pods too. And I found out that container!="" won't help for me, as my cause of duplication is two datasets with different service in Prometheus output. Those services are "kubelet" and "prometeus-kube-prometheus-kubelet"
And both datasets fit the condition container!=""

@dragoangel
Copy link
Author

dragoangel commented May 9, 2023

Those services are "kubelet" and "prometeus-kube-prometheus-kubelet"

I think this issue with your jobs in prometheus that you collect same metrics twice then. You need properly setup your prometheus stack.

@vitaliyf
Copy link

Same as #7679

1 similar comment
@Tantino
Copy link

Tantino commented May 29, 2023

Same as #7679

@jkroepke
Copy link
Contributor

jkroepke commented May 29, 2023

Container metrics are fine for me while pod metrics are 2x. It may depends on the installed CRI. On a AKS installed, container!="" would help.

image

Running from AKS with kube-prometheus-stack.

@dragoangel
Copy link
Author

Container metrics are fine for me while pod metrics are 2x. It may depends on the installed CRI. On a AKS installed, container!="" would help.

image

Running from AKS with kube-prometheus-stack.

Yes, this exactly what I mentioned

@jkroepke
Copy link
Contributor

Looking forward to #7777

@jkroepke
Copy link
Contributor

I can confirm that if I run minikube with docker driver, the metric container_memory_working_set_bytes does not have an container label.

It depends on the CRI runtime, if the metric container_memory_working_set_bytes does have the container metrics or not.

@Nokel81 That could be reason why user has some trouble in history.

If container_memory_working_set_bytes has a container label, then container!="". If not, then container!="" should be not present.

#7777 would be the best solution for all.

@dragoangel
Copy link
Author

I can confirm that if I run minikube with docker driver, the metric container_memory_working_set_bytes does not have an container label.

It depends on the CRI runtime, if the metric container_memory_working_set_bytes does have the container metrics or not.

@Nokel81 That could be reason why user has some trouble in history.

If container_memory_working_set_bytes has a container label, then container!="". If not, then container!="" should be not present.

#7777 would be the best solution for all.

If you not have container label - having promql expr with {container!=""} will not break your query...

@dragoangel
Copy link
Author

dragoangel commented Jun 19, 2023

After latest update Lens (v2023.5.310801) per container metrics broken fully 99% of time :( all of CPU\RAM\Filesystem says:
Metrics not available at the moment. Another problem is that downgrade Lens is not an easy thing to do...

@dragoangel
Copy link
Author

dragoangel commented Jul 10, 2023

Just for other people who struggle with latest version issues, I downgraded to 6.4.15 OpenLens to get stable monitoring tabs in Node view, Pod view, etc. All newer versions is not function properly. It anyway has issue described here, but at least other things not broken

After latest update Lens (v2023.5.310801) per container metrics broken fully 99% of time :( all of CPU\RAM\Filesystem says: Metrics not available at the moment. Another problem is that downgrade Lens is not an easy thing to do...

@dragoangel
Copy link
Author

Any updates issue?

@jkroepke
Copy link
Contributor

@dragoangel I have the feeling that Lens will now develop closed sourced. May not expect anything here.

@dragoangel
Copy link
Author

dragoangel commented Oct 20, 2023

@dragoangel I have the feeling that Lens will now develop closed sourced. May not expect anything here.

Yeah, you are totally right, as latest version of Lens is 2023.10.181418 and there are no releases here, which is bad :(. And in this version it's still same issues as was reported:

  1. x2 of metrics
  2. metrics for containers not available 99% of the time

@dark-brains
Copy link

Try to check your kubernetes services , I think there are services that duplicated. namespace: kube-system , services like kubelet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics All the things related to metrics enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants