WindowsPerf is (Linux perf
inspired) Windows on Arm performance profiling tool. Profiling is based on ARM64 PMU and its hardware counters. WindowsPerf supports the counting model for obtaining aggregate counts of occurrences of special events, and sampling model for determining the frequencies of event occurrences produced by program locations at the function, basic block, and/or instruction levels.
WindowsPerf can instrument Arm CPU performance counters. As of now, it can collect:
- Core PMU counters for all or specified CPU core.
- unCore PMU counters:
- ARM DynamIQ Shared Unit (DSU) PMU and
- DMC-520 Dynamic Memory Controller are supported.
- Arm Statistical Profiling Extension (SPE).
Currently we support:
- counting model: WindowsPerf can utilize the Performance Monitoring Unit (PMU) counters from the CPU, DSU, and DMC to capture detailed counting profiles of workloads. By leveraging these counters, WindowsPerf can monitor various performance metrics and events, providing insights into the behavior and efficiency of the system. This comprehensive profiling helps in identifying bottlenecks, optimizing performance, and ensuring that workloads are running efficiently across different components of the system. You can find examples here.
- sampling model: WindowsPerf can sample CPU Performance Monitoring Unit (PMU) events using two methods: software sampling and hardware sampling. In software sampling, the process is triggered by a PMU counter overflow interrupt request (IRQ), allowing the system to collect data at specific intervals. On the other hand, hardware sampling with the Arm Statistical Profiling Extension (SPE) provides precise sampling directly in hardware. This method captures detailed performance data without the overhead associated with software-based sampling, resulting in more accurate and reliable measurements. You can find examples here.
The integration of WindowsPerf and Arm Telemetry Solution is a significant advancement in performance analysis on Windows On Arm. This integration is primarily based on PMU (Performance Monitoring Unit) events, which provide a detailed insight into the system’s performance. One of the standout features of the WindowsPerf Tool is the implementation of the Arm Topdown Methodology for μarch (microarchitecture) performance analysis. This methodology is tailored for each Arm CPU μarch. It involves the use of PMU events, metrics, and groups of metrics to provide a comprehensive analysis of the system’s performance. Furthermore, the WindowsPerf Tool is capable of platform μarchitecture detection, including Neoverse-N1, V1, and N2 CPUs.
The Arm Telemetry Solution also includes a topdown-tool that leverages the WindowsPerf as a backend for Windows On Arm. This tool applies the top-down methodology to break down CPU performance into different hierarchical levels, providing a detailed and systematic approach to performance analysis.
The topdown-tool
uses the WindowsPerf to access the PMU events and metrics on Windows On Arm, enabling it to gather and analyze performance data directly from the hardware. This integration allows the topdown-tool
to provide a comprehensive view of the system’s performance, from high-level metrics to low-level, detailed μarch events.
You can find the latest WindowsPerf installation instructions in INSTALL.md.
You can find all binary releases of WindowsPerf (wperf-driver
and wperf
application) here.
You can find the latest WindowsPerf build instructions in BUILD.md.
When contributing to this repository, please first read CONTRIBUTING.md file for more details regarding how to contribute to this project.
WindowsPerf solution contains few projects:
- wperf is a perf-like user space command line interface tool.
- wperf-test contains unit tests for the
wperf
project. - wperf-driver is a Kernel-Mode Driver Framework (KMDF) driver.
- See Using WDF to Develop a Driver article for more details on KMDF.
- wperf-devgen is our own simple implementation of tool which can install or remove wperf-driver.
- See INSTALL.md for more details and usage.
- wperf-installer is our Windows Installer project. The project uses WiX Toolset to build a MSI package to install WindowsPerf. This project requires WiX v5.
- See wperf-installer/README.md for more details.
- wperf-lib is our WindowsPerf C library, please note that is doesn't not support all the latest features of WindowsPerf.
- wperf-lib-app is an example application linked with
wperf-lib
.- wperf-lib-c-compat is smoke test application for
wperf-lib
. - wperf-lib-timeline is smoke test application for
wperf-lib
.
- wperf-lib-c-compat is smoke test application for
- wperf-lib-app is an example application linked with
Other directories contain:
- wperf-common contains common code between
wperf
andwperf-driver
project. Mostly data structures describing IOCTRL binary protocol.- Note: wperf application communicates with wperf-driver via IOCTRL buffer. Proprietary binary protocol is used to exchange data, commands and status between two.
- wperf-scripts contains various scripts including testing scripts.
For more information regarding the project visit WindowsPerf Wiki.
- WindowsPerf Release 3.7.2 blog post.
- WindowsPerf Release 3.3.0 blog post.
- WindowsPerf Release 3.0.0 blog post.
- WindowsPerf Release 2.5.1 blog post.
- WindowsPerf release 2.4.0 introduces the first stable version of sampling model support blog post.
- Announcing WindowsPerf: Open-source performance analysis tool for Windows on Arm blog post.
- Enhancements in WindowsPerf.
- Boost your workload and platform performance on Windows on Arm with WindowsPerf.
- Perf for Windows on Arm (WindowsPerf).
- Get started with WindowsPerf.
- Sampling CPython with WindowsPerf.
- Arm Neoverse N1 PMU Guide.
- Arm Neoverse V1 PMU Guide.
- Arm Neoverse N2 PMU Guide.
- Arm Neoverse V2 PMU Guide.
- Arm CPU Telemetry Solution Topdown Methodology Specification.
- Arm Telemetry Solution Tools.
- Arm Neoverse N1 Core Telemetry Specification.
- Arm Neoverse V1 Core Telemetry Specification.
- Arm Neoverse N2 Core Telemetry Specification.
- Arm Neoverse V2 Core Telemetry Specification.
- Arm Neoverse N3 Core Telemetry Specification.
- Arm Neoverse V3 Core Telemetry Specification.
- Arm Statistical Profiling Extension: Performance Analysis Methodology White Paper documentation.
- Arm Neoverse V1 – Top-down Methodology for Performance Analysis & Telemetry Specification blog (with white paper).
- ARM64 Intrinsics documentation.
- Building and Loading a WDF Driver documentation.
- Write a Universal Windows driver (KMDF) based on a template documentation.