Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for profiling on Windows #141

Merged
merged 286 commits into from
May 1, 2024
Merged

Conversation

vvuk
Copy link
Collaborator

@vvuk vvuk commented Apr 20, 2024

This PR adds (initial) support for profiling on Windows. It uses the Event Tracing for Windows framework to get very fast, low-overhead captures.

In order to expedite getting something working, this PR calls out to the xperf.exe tool to start/stop tracking and save output to a file. It then parses events from that file, adding them to a profile. This is probably good enough for 95% use cases, but in the future it would be nice to use Etw directly to create an in-memory trace, handling events as they come in. Etw is just such an annoying API to work with that I didn't want to tackle that; I use the ferrisetw crate for event parsing, and it can probably be used for kicking off a kernel trace, but the process for enabling kernel providers is convoluted.

There's a few things that I need to fix:

  1. Sort out Windows administrator privilege elevation. I need administrator privileges for two things: to run xperf, and to query to list & global address of kernel drivers. Right now, I require that samply be run as administrator. But a better solution would be to spawn an elevated child process that can do all the elevated things, while the main process takes care of launching the process to be profiled (right now it's launched as Administrator, not great!)
  2. Implement profiling by pid. Trivial, I just haven't written the code yet. I'll also add support for just profiling the entire system, basically samply start and wait for input to stop profiling. (Or samply start --time 10s).
  3. Sort out some arch bits, right now there's some hardcoded aarch64 which isn't even correct (should be ARM64).
  4. Grab thread names. Double check main-thread rules (I'm not sure if the main thread and the process have the same PID, or if there's a different TID for a process's main thread. I think the latter, but double check.)
  5. Actually test on a system where I can use the profile provider :) ARM64 Windows seems to have a bug where the kernel PROFILE provider just doesn't work; errors out when you try to start. And I'm on a Apple Silicon Mac using Parallels, with no x86_64 (or non-VM ARM64) for another week.

But it works, and works quite well!

Future work:

  • Add support for JIT events, especially CoreCLR. Luckily, ETW is pretty pervasive on Windows, and CoreCLR delivers Jit events via ETW, so no need to mess around with jitdump files.
  • A bunch of things that will probably involve helping the profiler front-end to not be so Firefox specific so that we can capture additional data, but I'm scared of the front end code :|

jrmuizel and others added 30 commits October 3, 2021 14:58
Including:
- add stacks that don't start in user space
- properly register newly started threads
- use ProcessId from the user data instead of the event on thread start
  events
This allows us to avoid rebuilding it for every event.
They're currently broken
vvuk and others added 12 commits April 26, 2024 17:09
This now gets further when converting an x86_64 etl
file which I recorded with the following:

```
xperf -start "NT Kernel Logger" -SetProfInt 2500 -on latency -stackwalk profile+cswitch -start "usersession" -on Microsoft-JScript:0x3+d2d578d9-2936-45b6-a09f-30e32715f42d:0x8000000000010000+c923f508-96e4-5515-e32c-7539d1b10504:0x6
xperf -stop "NT Kernel Logger" -stop "usersession" -d prof.etl
```

In detail:
 - Move the things that need admin privileges to a place where they don't run.
 - Remove the tokio runtime and the call to get_library_info_for_binary
 - Stub out the kernel librray stuff
 - Put a default for the kernel start address
 - Set arch to x86_64 when running on x86_64, so that user and kernel
   stacks are merged correctly
 - Remove an unwrap which was panicking on the file I was testing with
samply/src/windows/mod.rs Outdated Show resolved Hide resolved
@vvuk vvuk marked this pull request as ready for review May 1, 2024 18:32
@mstange mstange merged commit ecbec37 into mstange:main May 1, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants