Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk IO stats for processes #1910

Closed
immanuelfodor opened this issue Dec 11, 2020 · 5 comments
Closed

Disk IO stats for processes #1910

immanuelfodor opened this issue Dec 11, 2020 · 5 comments

Comments

@immanuelfodor
Copy link

Note: I tried to join the Freenode room with a Matrix client but the room was so complex, it didn't let me in.

Note: I searched for similar issues but I have found none. This one #1891 was referring to the https://github.com/prometheus/procfs repo, so my question might belong to there, too. I'm happy to move this question there if you think so.

I would like to access per-process IO info like atop/iotop in Prometheus, so that I could see which process is doing extensive IO over the available disks. AFAIK, only per-disk stats are available currently in Prometheus, like sdb is doing X iops but no info about which processes are contributing to this value.

I also came accross https://github.com/ncabatoff/process-exporter which might solve this problem but (correct me if I'm wrong) it extracts info about a few listed process names only. However, I don't know the process names beforehand what to monitor.

My Kubernetes cluster is already instrumented with prometheus-operator and so the node_exporter, it would be great if it could provide more visibility on disk IO.

@immanuelfodor immanuelfodor changed the title IO stats for processes Disk IO stats for processes Dec 11, 2020
@SuperQ
Copy link
Member

SuperQ commented Dec 11, 2020

Per-process metrics are out of scope for the node_exporter, as it's intended for host metrics.

You're probably looking for container metrics from something like the Kubelet or cAdvisor.

@immanuelfodor
Copy link
Author

Thanks for the suggestion, I'll look around these two.

@gabrielmusskopf
Copy link

Hi @immanuelfodor! I'm facing the same situation you described. Did you find any solution? If so, can you post your strategy or workaround? Thanks

@immanuelfodor
Copy link
Author

The closest I got is the container_fs_* metrics, for example:

image

But it still doesn't answer what is happening on the node. It's just about the containers, no host processes are shown.
As you can see on the graph, the node FS is used more, but there is no stat on what is using it:

image

With CheckMK, Zabbix or alike, it might be possible to monitor it from the host, not with Prometheus, but it would be another tool that I'd need to correlate manually, so I gave up on the process level, although it would be great for debugging.

Then, if I can't monitor it per process, I moved onto the topic of limiting the IO, but it's also not possible, even as of now, but there was at least some progress on this side:

But we still don't have a real solution for kubernetes/kubernetes#92287 as these PRs were closed without merge:

I'm unaware of any working solution to either monitor it or limit it from k8s.
If you or somebody else bumps into something that works, please let me know 🤞

@gabrielmusskopf
Copy link

Unfortunately the process-exporter doesnt plan to add k8s support, as mentioned in this PR #94, what could solve this issue. Yes, I'll update if I find something, thanks for answering!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants