Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyper-V CPUWaitTimePerDispatch metrics value looks like not correct #1782

Open
mahumac opened this issue Nov 29, 2024 · 8 comments
Open

Hyper-V CPUWaitTimePerDispatch metrics value looks like not correct #1782

mahumac opened this issue Nov 29, 2024 · 8 comments

Comments

@mahumac
Copy link

mahumac commented Nov 29, 2024

Problem Statement

我的英语不好,问题描述是使用google翻译的。

( My English is not good, the problem description is translated using Google. )

# HELP windows_hyperv_vm_cpu_wait_time_per_dispatch_total Time in nanoseconds waiting for a virtual processor to be dispatched onto a logical processor
# TYPE windows_hyperv_vm_cpu_wait_time_per_dispatch_total counter
windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="0",vm="Prometheus"} 6.65257352e+09
windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="1",vm="Prometheus"} 5.835522719e+09
windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="2",vm="Prometheus"} 5.591472665e+09
windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="3",vm="Prometheus"} 5.293499635e+09

可以看到获取到的 metric windows_hyperv_vm_cpu_wait_time_per_dispatch_total类型是 counter
( You can see that the type of the metric windows_hyperv_vm_cpu_wait_time_per_dispatch_total is counter. )

windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="0",vm="Prometheus"} 的值为6.65257352e+09, 换算成十进制数为 6652573520 ns ,约等于 66.53 μs。
(The value of windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="0",vm="Prometheus"} is 6.65257352e+09, which is converted to a decimal number of 6652573520 ns, which is approximately equal to 66.53 μs.)

在grafana中查看:
( View it in grafana: )
image

6652573520这个值找不到,但是可以发现这个值 位于 20:02:20 -- 20:02:30时间段之间。
( The value of 6652573520 cannot be found, but it can be found that this value is between 20:02:20--20:02:30. )

image

在 time series 图中,也可以看到windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="0",vm="Prometheus"}的值是一个不断增长的counter类型。
( In the time series chart, you can also see that the value of windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="0",vm="Prometheus"} is a counter type that keeps growing. )

但是在windows perfmon性能查看器中查看其实际值,显示如下:
(However, when viewing its actual value in the windows perfmon viewer, it is shown as follows:)

image

windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="0",vm="Prometheus"} 的值看上去应该是gauge类型,而不是 counter 类型。真实的值在 11.94 μs ~ 58.98 μs 之间,平均值 大约等于 27.60 μs 。
(The value of windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="0",vm="Prometheus"} looks like a gauge, not a counter type. The actual value is between 11.94 μs and 58.98 μs, and the average value is about 27.60 μs.)

Environment

  • windows_exporter Version: 0.29.2
  • Windows Server Version: 2025
@jkroepke
Copy link
Member

jkroepke commented Nov 29, 2024

windows_hyperv_vm_cpu_wait_time_per_dispatch_total represents the RAW metric. You have to use rate(windows_hyperv_vm_cpu_wait_time_per_dispatch_total[1m]) to get the CPU usage back.

@mahumac
Copy link
Author

mahumac commented Nov 29, 2024

关于这个metric的文档太少了, 没有详细的解释文档。还是怀疑这个metric值的正确性。

(There is too little documentation about this metric, and no detailed explanation document. I still doubt the correctness of this metric value.)

[What is normal range for CPU Wait time Per Dispatch on Hyper-V 2012 | Microsoft Learn](https://learn.microsoft.com/en-us/archive/msdn-technet-forums/cc2cf7c7-8056-45e4-907e-3089852af442)

[Confusion about Hyper-V counter: CPU Wait Time Per Dispatch... : r/sysadmin](https://www.reddit.com/r/sysadmin/comments/s9a438/confusion_about_hyperv_counter_cpu_wait_time_per/)

[Windows performance counter "CPU Wait Time Per Dispatch" return wrong value. · Issue #1528 · influxdata/telegraf](influxdata/telegraf#1528)

Question 1:

windows_hyperv_vm_cpu_wait_time_per_dispatch_total{core="0",vm="Prometheus"} 6.65257352e+09

represents the RAW metric. ( RAW value = 6.65257352e+09 )

but the value of 6652573520 cannot be found , it can be found that this value is between 20:02:20--20:02:30. ( my prometheus scrape_interval = 10s)

Question 2:

image

I got the result with rate(windows_hyperv_vm_cpu_wait_time_per_dispatch_total[1m]) , the average value is about 141 μs。

but, the value in the Windows Perfmon viewer is around 27 μs, which is much less than 141 μs

@jkroepke
Copy link
Member

windows_exporter Version: 0.29.2

Would it be possible to switch to https://github.com/prometheus-community/windows_exporter/releases/tag/v0.30.0-rc.1 ?

0.29 uses Performance Counter Values through WMI, which may could be differ.

But keep in might, the HyperV metric names has been changed: #1712

For example:
windows_hyperv_vm_cpu_wait_time_per_dispatch_total -> windows_hyperv_hypervisor_virtual_processor_cpu_wait_time_per_dispatch_total

@mahumac
Copy link
Author

mahumac commented Nov 29, 2024

使用 windows_exporter-0.30.0-rc.1-amd64之后, 出现了2个新的故障。
(After using windows_exporter-0.30.0-rc.1-amd64, 2 new failures appeared.)

Q1

After running the windows_exporter-0.30.0-rc.1-amd64.msi installer, the windows_exporter service cannot be started.

Execute .\windows_exporter.exe --config.file=config.yaml --log.file=log.txt --log.level=debug in cmd(as an administrator account), and it can run normally.

Q2

After running normally, some Hyper-V Performance Counter cannot be found,the error message is as follows:

time=2024-11-30T00:21:13.860+08:00 level=INFO source=main.go:268 msg="Shutting down windows_exporter via kill signal"
time=2024-11-30T00:21:13.861+08:00 level=INFO source=main.go:286 msg="windows_exporter has shut down"
time=2024-11-30T00:21:18.957+08:00 level=INFO source=config.go:79 msg="Loading configuration file: config.yaml"
time=2024-11-30T00:21:18.968+08:00 level=DEBUG source=main.go:188 msg="Logging has Started"
time=2024-11-30T00:21:18.969+08:00 level=DEBUG source=main.go:327 msg="setting process priority to normal"
time=2024-11-30T00:21:18.973+08:00 level=WARN source=perfdata.go:105 msg="The perfdata collector is in an experimental state! The configuration may change in future. Please report any issues."
time=2024-11-30T00:21:18.973+08:00 level=WARN source=os.go:108 msg="The os collect holds a number of deprecated metrics and will be removed mid 2025. See https://github.com/prometheus-community/windows_exporter/pull/1596 for more information." collector=os
time=2024-11-30T00:21:18.973+08:00 level=INFO source=textfile.go:108 msg="textfile directories: C:\\Program Files\\windows_exporter\\textfile_inputs" collector=textfile
time=2024-11-30T00:21:19.319+08:00 level=WARN source=main.go:210 msg="couldn't initialize collector" err="error build collector hyperv: failed to build hypervisor_root_virtual_processor collector: failed to create Hyper-V Hypervisor Root Virtual Processor collector: failed to initialize collector: failed to add counter \\Hyper-V Hypervisor Root Virtual Processor(*)\\% Guest Idle Time: 找不到指定的计数器。\r\n\nfailed to build hypervisor_virtual_processor collector: failed to create Hyper-V Hypervisor Virtual Processor collector: failed to initialize collector: failed to add counter \\Hyper-V Hypervisor Virtual Processor(*)\\% Guest Idle Time: 找不到指定的计数器。\r\n"
time=2024-11-30T00:21:19.322+08:00 level=INFO source=main.go:301 msg="Running as MY-NUC\\Administrator"
time=2024-11-30T00:21:19.322+08:00 level=INFO source=main.go:218 msg="Enabled collectors: cpu, cpu_info, diskdrive, logical_disk, net, os, perfdata, physical_disk, service, system, textfile, hyperv, logon"
time=2024-11-30T00:21:19.323+08:00 level=INFO source=main.go:236 msg="starting windows_exporter in 366.1032ms" version=0.30.0-rc.1 branch=HEAD revision=a2db81494ecec90c11900bf08a826f1b99ad61e7 goversion=go1.23.3 builddate=20241128-16:35:04 maxprocs=8
time=2024-11-30T00:21:19.324+08:00 level=INFO source=tls_config.go:347 msg="Listening on" address=[::]:9182
time=2024-11-30T00:21:19.324+08:00 level=INFO source=tls_config.go:350 msg="TLS is disabled." http2=false address=[::]:9182
time=2024-11-30T00:21:22.109+08:00 level=DEBUG source=collect.go:220 msg="collector perfdata succeeded after 543.6µs, resulting in 0 metrics"
time=2024-11-30T00:21:22.110+08:00 level=DEBUG source=collect.go:220 msg="collector textfile succeeded after 900.7µs, resulting in 0 metrics"
time=2024-11-30T00:21:22.112+08:00 level=DEBUG source=collect.go:220 msg="collector os succeeded after 3.3827ms, resulting in 11 metrics"
time=2024-11-30T00:21:22.113+08:00 level=DEBUG source=collect.go:220 msg="collector logon succeeded after 4.4608ms, resulting in 14 metrics"
time=2024-11-30T00:21:22.114+08:00 level=DEBUG source=collect.go:220 msg="collector physical_disk succeeded after 4.9426ms, resulting in 24 metrics"
time=2024-11-30T00:21:22.115+08:00 level=DEBUG source=collect.go:220 msg="collector cpu succeeded after 5.9662ms, resulting in 153 metrics"
time=2024-11-30T00:21:22.115+08:00 level=DEBUG source=collect.go:220 msg="collector system succeeded after 6.4733ms, resulting in 8 metrics"
time=2024-11-30T00:21:22.115+08:00 level=DEBUG source=collect.go:220 msg="collector logical_disk succeeded after 6.4733ms, resulting in 119 metrics"
time=2024-11-30T00:21:22.120+08:00 level=DEBUG source=collect.go:220 msg="collector cpu_info succeeded after 10.9763ms, resulting in 7 metrics"
time=2024-11-30T00:21:22.123+08:00 level=DEBUG source=collect.go:220 msg="collector net succeeded after 13.6905ms, resulting in 28 metrics"
time=2024-11-30T00:21:22.123+08:00 level=DEBUG source=collect.go:220 msg="collector hyperv succeeded after 13.6905ms, resulting in 522 metrics"
time=2024-11-30T00:21:22.126+08:00 level=DEBUG source=collect.go:220 msg="collector diskdrive succeeded after 17.1681ms, resulting in 72 metrics"
time=2024-11-30T00:21:22.136+08:00 level=DEBUG source=collect.go:220 msg="collector service succeeded after 26.9188ms, resulting in 3173 metrics"

time=2024-11-30T00:21:32.120+08:00 level=DEBUG source=collect.go:220 msg="collector perfdata succeeded after 0s, resulting in 0 metrics"
time=2024-11-30T00:21:32.120+08:00 level=DEBUG source=collect.go:220 msg="collector textfile succeeded after 802.1µs, resulting in 0 metrics"
time=2024-11-30T00:21:32.121+08:00 level=DEBUG source=collect.go:220 msg="collector os succeeded after 1.3368ms, resulting in 11 metrics"
time=2024-11-30T00:21:32.123+08:00 level=DEBUG source=collect.go:220 msg="collector physical_disk succeeded after 3.1517ms, resulting in 24 metrics"
time=2024-11-30T00:21:32.123+08:00 level=DEBUG source=collect.go:220 msg="collector cpu succeeded after 3.1517ms, resulting in 153 metrics"
time=2024-11-30T00:21:32.125+08:00 level=DEBUG source=collect.go:220 msg="collector system succeeded after 4.3596ms, resulting in 8 metrics"
time=2024-11-30T00:21:32.127+08:00 level=DEBUG source=collect.go:220 msg="collector logon succeeded after 7.4212ms, resulting in 14 metrics"
time=2024-11-30T00:21:32.129+08:00 level=DEBUG source=collect.go:220 msg="collector hyperv succeeded after 8.9944ms, resulting in 522 metrics"
time=2024-11-30T00:21:32.130+08:00 level=DEBUG source=collect.go:220 msg="collector logical_disk succeeded after 9.9927ms, resulting in 119 metrics"
time=2024-11-30T00:21:32.132+08:00 level=DEBUG source=collect.go:220 msg="collector net succeeded after 12.1378ms, resulting in 28 metrics"
time=2024-11-30T00:21:32.133+08:00 level=DEBUG source=collect.go:220 msg="collector cpu_info succeeded after 13.2985ms, resulting in 7 metrics"
time=2024-11-30T00:21:32.139+08:00 level=DEBUG source=collect.go:220 msg="collector diskdrive succeeded after 19.4167ms, resulting in 72 metrics"
time=2024-11-30T00:21:32.149+08:00 level=DEBUG source=collect.go:220 msg="collector service succeeded after 29.5446ms, resulting in 3173 metrics"

Tip

This Chinese character 找不到指定的计数器 means The specified counter cannot be found

My OS information is as follows:

PS C:\Program Files\windows_exporter> systeminfo.exe

主机名:             MY-NUC
OS 名称:            Microsoft Windows Server 2025 Datacenter
OS 版本:            10.0.26100 暂缺 Build 26100
OS 制造商:          Microsoft Corporation
OS 配置:            独立服务器
OS 构建类型:        Multiprocessor Free
注册的所有人:       暂缺
注册的组织:         暂缺
产品 ID:            00491-50000-00001-AA686
初始安装日期:       2024/11/4, 3:54:08
系统启动时间:       2024/11/29, 4:50:39
系统制造商:         Intel(R) Client Systems
系统型号:           NUC8i5BEH
系统类型:           x64-based PC
处理器:             安装了 1 个处理器。
                    [01]: Intel64 Family 6 Model 142 Stepping 10 GenuineIntel ~2304 Mhz
BIOS 版本:          Intel Corp. BECFL357.86A.0095.2023.0918.1953, 2023/9/18
Windows 目录:       C:\Windows
系统目录:           C:\Windows\system32
启动设备:           \Device\HarddiskVolume1
系统区域设置:       zh-cn;中文(中国)
输入法区域设置:     zh-cn;中文(中国)
时区:               (UTC+08:00) 北京,重庆,香港特别行政区,乌鲁木齐
物理内存总量:       32,629 MB
可用的物理内存:     25,169 MB
虚拟内存: 最大值:   37,493 MB
虚拟内存: 可用:     30,486 MB
虚拟内存: 使用中:   7,007 MB
页面文件位置:       C:\pagefile.sys
域:                 WORKGROUP
登录服务器:         \\MY-NUC
修补程序:           安装了 3 个修补程序。
                    [01]: KB5045934
                    [02]: KB5046617
                    [03]: KB5047621
网卡:               安装了 3 个 NIC。
                    [01]: Intel(R) Ethernet Connection (6) I219-LM
                        连接名:      以太网
                        启用 DHCP:   否
                        IP 地址
                    [02]: Intel(R) Wireless-AC 9560 160MHz
                        连接名:      WLAN
                        启用 DHCP:   是
                        DHCP 服务器: 192.168.50.1
                        IP 地址
                          [01]: 192.168.50.235
                          [02]: fe80::d180:2977:69e5:48b4
                    [03]: Hyper-V Virtual Ethernet Adapter
                        连接名:      vEthernet (vSwitch)
                        启用 DHCP:   否
                        IP 地址
                          [01]: 192.168.50.5
基于虚拟化的安全性: 状态: 正在运行
                    所需的安全属性:
                    可用的安全属性:
                          基本的虚拟化支持
                          安全启动
                          DMA 保护
                          安全内存覆盖
                          UEFI 代码只读
                          基于模式的执行控制
                    Services Configured:
                    Services Running:
                    App Control for Business policy: 强制
                    App Control for Business user mode policy: 关
                    已启用安全功能:
Hyper-V 要求:       已检测到虚拟机监控程序。将不显示 Hyper-V 所需的功能。
PS C:\Program Files\windows_exporter>

@jkroepke
Copy link
Member

jkroepke commented Nov 29, 2024

Q1: Check the events logs. I guess you defined some EXTRA_FLAGS on the msi package.

Q2:

Microsoft Windows Server 2025 Datacenter

The collector was tested on 2022 and 2019. It could be possible that some counter are not longer exists on 2025.

0.29.0 fetches metrics via WMI and don't care about missing counters. Since Performance Counters are used as source which are a bit more strict.

You could typeperf -qx to check, what counter exists.

@mahumac
Copy link
Author

mahumac commented Nov 29, 2024

Q1: Check the events logs. I guess you defined some EXTRA_FLAGS on the msi package.

I manually deleted the windows_exporter service, restarted the computer, and then freshly installed windows_exporter-0.30.0-rc.1-amd64.msi, which prompted the following error:
image

  • event id 7009: 等待 windows_exporter 服务的连接超时(30000 毫秒)。
  • event id 7000: 由于下列错误,windows_exporter 服务启动失败: 服务没有及时响应启动或控制请求。

I installed it by manually double-clicking the .msi file, enabled the firewall exception in the GUI interface, and kept all other options as default.

Q2: Microsoft Windows Server 2025 Datacenter

\Hyper-V Hypervisor Virtual Processor(Prometheus:Hv VP 0)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Virtual Processor(Prometheus:Hv VP 1)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Virtual Processor(Prometheus:Hv VP 2)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Virtual Processor(Prometheus:Hv VP 3)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Virtual Processor(_Total)\CPU Wait Time Per Dispatch
... ... 
\Hyper-V Hypervisor Root Virtual Processor(Root VP 0)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Root Virtual Processor(Root VP 1)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Root Virtual Processor(Root VP 2)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Root Virtual Processor(Root VP 3)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Root Virtual Processor(Root VP 4)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Root Virtual Processor(Root VP 5)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Root Virtual Processor(Root VP 6)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Root Virtual Processor(Root VP 7)\CPU Wait Time Per Dispatch
\Hyper-V Hypervisor Root Virtual Processor(_Total)\CPU Wait Time Per Dispatch

I have exported all typeperf on winserver 2025, Hyper-V Performance Counter does exist,please see the attachment.
typeperf.txt

@jkroepke
Copy link
Member

jkroepke commented Nov 29, 2024

Seems like % Guest Idle Time is absent on Windows Server 2025.

I have to fix that first.

@jkroepke
Copy link
Member

v0.30.0-rc.2 would now work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants