Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange TdhGetEventInformation failure for SampleProf events #249

Open
clemenswasser opened this issue Nov 12, 2024 · 15 comments
Open

Strange TdhGetEventInformation failure for SampleProf events #249

clemenswasser opened this issue Nov 12, 2024 · 15 comments

Comments

@clemenswasser
Copy link

Hello,

I am really sorry for "abusing" your Issue tracker, but I don't know any better place where I could report this issue and ask for help from ETW experts.

Backstory:

I am currently trying to use a Windows sampling profiler based on ETW and specifically the events emitted by the "NT Kernel Logger" on my new Windows PC. The sampling profiler uses the trace data helper (TDH) Win32 library (tdh.dll) for "decoding" the ETW events.
Some relevant events include: MSNT_SystemTrace/StackWalk/Stack, MSNT_SystemTrace/Process/Start and most importantly MSNT_SystemTrace/PerfInfo/SampleProf (with the actual samples).

Problem:

The problem is that for unknown reasons, the TdhGetEventInformation function fails specifically on my new PC and only for the MSNT_SystemTrace/PerfInfo/SampleProf EVENT_RECORDs. On any other PC I tested the function works correctly and on my new PC the function succeeds for all other EVENT_RECORDs except the SampleProf EVENT_RECORDs.

Repro:

A small repro code with an extracted MSNT_SystemTrace/PerfInfo/SampleProf EVENT_RECORD is here: https://gist.github.com/clemenswasser/e11f06eacbcc9118a6be88db445db77e
When running it on my new PC I get the following incorrect output (TdhGetEventInformation incorrectly returns ERROR_NOT_FOUND):

TdhGetEventInformation status = 1168

On any other PC I get the following correct output (TdhGetEventInformation correctly returns ERROR_INSUFFICIENT_BUFFER):

TdhGetEventInformation status = 122

As I am not really experienced with ETW I don't have any clue as to why this happens (Is there some global ETW configuration I can repair or something similar?).
I already tried to debug the machine code of the TdhGetEventInformation implementation and comparing it with a TTD trace of a system where it works, but without much success.

@0xhellord
Copy link

@clemenswasser
Copy link
Author

try this: ? https://techcommunity.microsoft.com/blog/askperf/wmi-recompiling-wmi-mofs/373848

I have now tried this and also similar steps described in this article: https://woshub.com/wmi-troubleshooting/
But sadly none of them helped :( Isn't there an option to completely nuke/restore those mof files, etc.
Or do I have to reinstall Windows... ?

@0xhellord
Copy link

try this: ? https://techcommunity.microsoft.com/blog/askperf/wmi-recompiling-wmi-mofs/373848

I have now tried this and also similar steps described in this article: https://woshub.com/wmi-troubleshooting/
But sadly none of them helped :( Isn't there an option to completely nuke/restore those mof files, etc.
Or do I have to reinstall Windows... ?

All the commands in these two articles are used to re-register or recompile MOF files. However, if some MOF files are missing (deleted or corrupted), these commands will not be effective.

Quick solution: consider reinstalling Windows. Alternatively, using DISM or Windows Recovery might also be worth a try.

Root cause analysis: compare the list and contents of MOF files with those from a freshly installed Windows system. You could also back up the existing MOF files, copy all MOF files from a freshly installed Windows system, and then recompile/register these files.

These suggestions are based on my personal experience. Please proceed at your own risk.

@clemenswasser
Copy link
Author

I have now literally removed the complete C:\Windows\System32\wbem folder and successfully restored it using sfc /scannow, but it still doesn't work. I think my only option left is Windows Recovery or a complete fresh installation 😭

@clemenswasser
Copy link
Author

I just did a completely fresh Windows 11 24H2 install, but I still have the bug, so it seems to be a regression with a new Windows update... Maybe they changed something about the MSNT_SystemTrace/PerfInfo/SampleProf events in the MOF files or something? How could I best report this, as none of my "Feedback Hub" Tickets were answered...

@jrmuizel
Copy link

jrmuizel commented Dec 2, 2024

I also see this issue on Win 11 24H2 26100.2454

@jrmuizel
Copy link

jrmuizel commented Dec 2, 2024

The problem also shows up when you loaded traces into traceview.

@clemenswasser when you tried using TDD were you looking at svchost.exe? It seems like that's where the metadata reading actually happens. If you take a look at this profile of traceview/svchost loading and etl file it provides some insight into what's going on: https://share.firefox.dev/3VlP9Db

@clemenswasser
Copy link
Author

@jrmuizel
This is what I noted while comparing the TDD traces with correct execution on old system vs incorrect on new system:

  • tdh!TdhpGetTraceEventInfoFromMofInfo wrongly returns 0 in rax on new system
    • tdh!TdhpAllocAndGetTraceEventInfoArray wrongly returns non 0 in rax on new system
      • tdh!TdhpFindMatchClassFromWBEM wrongly returns non 0 in rax on new system

I tried to debug it side by side with Ghidra disassembly opened, but didn't get very far.
So most definitely something broke in the WBEM repository of 24H2, possibly SampleProf definition missing/incorrect?
Thanks for the idea of profiling the execution of TdhGetEventInformation. I looked a bit through it, CRepository::QueryClasses would be interesting I guess. Maybe we could dll inject or something and log what they are looking for in the wbem/wmi repository? Then we would know what is missing, I guess?
Btw. my system where TdhGetEventInformation works correctly is on 23H2 so it broke between it and 24H2 (I am also on 26100.2454).

@jrmuizel
Copy link

jrmuizel commented Dec 3, 2024

Looks like we can see the queries using wmimon

@jrmuizel
Copy link

jrmuizel commented Dec 3, 2024

On a machine that's not having this problem I can use wbemtest to connect to root\wmi and then "Enum Classes" on MSNT_SystemTrace to find the PerfInfo classes. I haven't tried on the machine that's having the problem yet.

@clemenswasser
Copy link
Author

wmimon confirms that TdhGetEventInformation calls CreateClassEnum multiple times for MSNT_SystemTrace, PerfInfo_V2 and SampledProfile. I compared the wbem objects of PerfInfo_V2 and it's derived object SampledProfile between both systems (working 23H2 and not working 24H2), but they seem to be identical...

@jrmuizel
Copy link

jrmuizel commented Dec 3, 2024

@clemenswasser I saw your issue here: https://developercommunity.visualstudio.com/t/Function-TdhGetEventInformation-does-not/10800006?sort=active&topics=windows+10.0.18850. Did you file feedback through Feedback Hub? If so, if you can send me the link to the feedback I can pass it on to our Microsoft contact.

@clemenswasser
Copy link
Author

Yes, I did. I filed one Feedback Hub ticket before the developer community ticket in the WPA category: https://aka.ms/AAtmswa
As you saw, the developer community said I should use the "API Feedback" category, so I also filed one there: https://aka.ms/AAtm0zp

@jrmuizel
Copy link

jrmuizel commented Dec 3, 2024

Thanks, I've passed those links on to our contact

@clemenswasser
Copy link
Author

@jrmuizel FYI: I have just updated to the latest "Windows 11 Insider Preview Build 27758.1000" as an experiment. This seems to be fixed in that version 🥳:

> .\TdhGetEventInformationBug.exe
SampleProf TdhGetEventInformation status = 122

mstange added a commit to mstange/samply that referenced this issue Dec 6, 2024
Fixes #348.

This works around missing schemas after a recent Windows 11 update.
Specifically, version 10.0.26100 24H2 was affected by this.

See microsoft/krabsetw#249 for some more details.
mstange added a commit to mstange/samply that referenced this issue Dec 6, 2024
Fixes #348.

This works around missing schemas after a recent Windows 11 update.
Specifically, version 10.0.26100 24H2 was affected by this.

See microsoft/krabsetw#249 for some more details.
mstange added a commit to mstange/samply that referenced this issue Dec 6, 2024
Fixes #348.

This works around missing schemas after a recent Windows 11 update.
Specifically, version 10.0.26100 24H2 was affected by this.

See microsoft/krabsetw#249 for some more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants