Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully dynamic ETW controlled logging for ORT and QNN logs #20537

Merged
merged 12 commits into from
Jun 7, 2024

Conversation

ivberg
Copy link
Contributor

@ivberg ivberg commented May 1, 2024

Description

Windows - Fully dynamic ETW controlled logging for ORT and QNN logs

The logging support is documented here

Also add support for logging ORT SessionCreation on ETW CaptureState

Motivation and Context

The previous ETW support only worked if you enabled ETW before the session started. There can commonly be long-lived AI inference processes that need to be traced & debugged. This enables logging fully on the fly.

Without this support a dev would have to end up killing a process or stopping a service in order to get tracing. We had to do this for a recent issue with QNN, and it was a bit painful to get the logs and it ruined the repro.

Testing

I tested with the following cases

  • Leaving default ORT run
  • Enabling ETW prior to start and leaving running for entire session + inferences, then stopping
  • Starting ORT session + inf, then enabling and stopping ETW
    • Start ORT session /w long running Inferences
    • wpr -start ort.wprp -start etw_provider.wprp
    • Wait a few seconds
    • wpr -stop ort.etl
    • Inferences are still running
    • Verify ONNXRuntimeLogEvent provider events are present and new SessionCreation_CaptureState event under Microsoft.ML.ONNXRuntime provider

Related:
#18882
#19428

ivberg added 2 commits May 2, 2024 12:20
…c_cast and RTTI. Instead use a custom type system on ISink that is safe for use with static_cast
@jywu-msft
Copy link
Member

+@snnn , @skottmckay who helped review #18882

@jywu-msft jywu-msft requested review from snnn and skottmckay May 3, 2024 22:00
@snnn snnn requested a review from pranavsharma May 7, 2024 22:16
… events as ORT verbose. This is to preserve existing expected behavior by some customers. It also unfortunately means the only pratical method to get any QNN logs is to fully set the level to VERBOSE
@ivberg ivberg requested a review from HectorSVC May 29, 2024 19:23
onnxruntime/core/common/logging/logging.cc Outdated Show resolved Hide resolved
onnxruntime/core/common/logging/logging.cc Outdated Show resolved Hide resolved
include/onnxruntime/core/common/logging/logging.h Outdated Show resolved Hide resolved
onnxruntime/core/common/logging/logging.cc Outdated Show resolved Hide resolved
onnxruntime/core/common/logging/logging.cc Outdated Show resolved Hide resolved
@ivberg ivberg requested a review from skottmckay June 6, 2024 23:14
@ivberg ivberg merged commit 74028e4 into main Jun 7, 2024
96 checks passed
@ivberg ivberg deleted the user/ivberg/FullyDynamicETWLogging branch June 7, 2024 04:11
@sophies927 sophies927 added release:1.18.1 triage:approved Approved for cherrypicks for release labels Jun 7, 2024
ivberg added a commit that referenced this pull request Jun 20, 2024
### Description
Under certain conditions with enabling & disabling ETW continuously, we
got a crash report.
Allows ETW callbacks to be de-registered upon class destructor.
Related to #20537

### Motivation and Context
Fixes crash

### Callstack
We see it crash in
[0x0]
onnxruntime!<lambda_967a738fca8512372f170fcaf2d094d4>::operator()+0x34
0x12941ff570 0x7ffa994f0a04

[0x1] onnxruntime!std::_Func_class<void,_GUID const *,unsigned
long,unsigned char,unsigned __int64,unsigned
__int64,_EVENT_FILTER_DESCRIPTOR *,void *>::operator()+0x54 0x12941ff7b0
0x7ffa994f0d64

[0x2]
onnxruntime!onnxruntime::logging::EtwRegistrationManager::InvokeCallbacks+0xcc
0x12941ff7b0 0x7ffa994f0d64

[0x3]
onnxruntime!onnxruntime::logging::EtwRegistrationManager::ORT_TL_EtwEnableCallback+0x94
0x12941ff860 0x7ffa98d19628
 

and seems to us that the this pointer captured in 
etwRegistrationManager.RegisterInternalCallback(
      [&etwRegistrationManager, this](
...
is no longer valid when the callback is called.
@jywu-msft jywu-msft removed triage:approved Approved for cherrypicks for release release:1.18.1 labels Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants