Disk IO stats for processes #1910

immanuelfodor · 2020-12-11T13:09:24Z

Note: I tried to join the Freenode room with a Matrix client but the room was so complex, it didn't let me in.

Note: I searched for similar issues but I have found none. This one #1891 was referring to the https://github.com/prometheus/procfs repo, so my question might belong to there, too. I'm happy to move this question there if you think so.

I would like to access per-process IO info like atop/iotop in Prometheus, so that I could see which process is doing extensive IO over the available disks. AFAIK, only per-disk stats are available currently in Prometheus, like sdb is doing X iops but no info about which processes are contributing to this value.

I also came accross https://github.com/ncabatoff/process-exporter which might solve this problem but (correct me if I'm wrong) it extracts info about a few listed process names only. However, I don't know the process names beforehand what to monitor.

My Kubernetes cluster is already instrumented with prometheus-operator and so the node_exporter, it would be great if it could provide more visibility on disk IO.

The text was updated successfully, but these errors were encountered:

SuperQ · 2020-12-11T14:12:07Z

Per-process metrics are out of scope for the node_exporter, as it's intended for host metrics.

You're probably looking for container metrics from something like the Kubelet or cAdvisor.

immanuelfodor · 2020-12-16T12:07:12Z

Thanks for the suggestion, I'll look around these two.

gabrielmusskopf · 2023-09-05T13:00:44Z

Hi @immanuelfodor! I'm facing the same situation you described. Did you find any solution? If so, can you post your strategy or workaround? Thanks

immanuelfodor · 2023-09-05T16:43:10Z

The closest I got is the container_fs_* metrics, for example:

But it still doesn't answer what is happening on the node. It's just about the containers, no host processes are shown.
As you can see on the graph, the node FS is used more, but there is no stat on what is using it:

With CheckMK, Zabbix or alike, it might be possible to monitor it from the host, not with Prometheus, but it would be another tool that I'd need to correlate manually, so I gave up on the process level, although it would be great for debugging.

Then, if I can't monitor it per process, I moved onto the topic of limiting the IO, but it's also not possible, even as of now, but there was at least some progress on this side:

But we still don't have a real solution for kubernetes/kubernetes#92287 as these PRs were closed without merge:

I'm unaware of any working solution to either monitor it or limit it from k8s.
If you or somebody else bumps into something that works, please let me know 🤞

gabrielmusskopf · 2023-09-06T12:25:15Z

Unfortunately the process-exporter doesnt plan to add k8s support, as mentioned in this PR #94, what could solve this issue. Yes, I'll update if I find something, thanks for answering!

immanuelfodor changed the title ~~IO stats for processes~~ Disk IO stats for processes Dec 11, 2020

discordianfish closed this as completed Dec 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk IO stats for processes #1910

Disk IO stats for processes #1910

immanuelfodor commented Dec 11, 2020

SuperQ commented Dec 11, 2020 •

edited

Loading

immanuelfodor commented Dec 16, 2020

gabrielmusskopf commented Sep 5, 2023

immanuelfodor commented Sep 5, 2023

gabrielmusskopf commented Sep 6, 2023

Disk IO stats for processes #1910

Disk IO stats for processes #1910

Comments

immanuelfodor commented Dec 11, 2020

SuperQ commented Dec 11, 2020 • edited Loading

immanuelfodor commented Dec 16, 2020

gabrielmusskopf commented Sep 5, 2023

immanuelfodor commented Sep 5, 2023

gabrielmusskopf commented Sep 6, 2023

SuperQ commented Dec 11, 2020 •

edited

Loading