diff --git a/docs/execution-providers/QNN-ExecutionProvider.md b/docs/execution-providers/QNN-ExecutionProvider.md index 8adb87d9cbc51..e4fed4abd17ab 100644 --- a/docs/execution-providers/QNN-ExecutionProvider.md +++ b/docs/execution-providers/QNN-ExecutionProvider.md @@ -46,6 +46,9 @@ The QNN Execution Provider supports a number of configuration options. The `prov |'basic'|| |'detailed'|| +See [profiling-tools](/docs/performance/tune-performance/profiling-tools.md) for more info on profiling +Alternanatively to setting profiling_level at compile time, profiling can be enabled dynamically with ETW (Windows). See [tracing](/docs/performance/tune-performance/logging_tracing.md) for more details + |`provider_options_values` for `provider_options_keys = "rpc_control_latency"`|Description| |---|---| |microseconds (string)|allows client to set up RPC control latency in microseconds| diff --git a/docs/performance/tune-performance/iobinding.md b/docs/performance/tune-performance/iobinding.md index baa97420d185f..d8c433fa9d7ad 100644 --- a/docs/performance/tune-performance/iobinding.md +++ b/docs/performance/tune-performance/iobinding.md @@ -2,7 +2,7 @@ title: I/O Binding grand_parent: Performance parent: Tune performance -nav_order: 4 +nav_order: 5 --- # I/O Binding diff --git a/docs/performance/tune-performance/logging_tracing.md b/docs/performance/tune-performance/logging_tracing.md new file mode 100644 index 0000000000000..bc99a453025ab --- /dev/null +++ b/docs/performance/tune-performance/logging_tracing.md @@ -0,0 +1,83 @@ +--- +title: Logging & Tracing +grand_parent: Performance +parent: Tune performance +nav_order: 2 +--- + +# Logging & Tracing + +## Contents +{: .no_toc } + +* TOC placeholder +{:toc} + + +## Developer Logging + +ONNX Runtime has built-in cross-platform internal [printf style logging LOGS()](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/common/logging/macros.h). This logging is available to configure in *production builds* for a dev **using the API**. + +There will likely be a performance penalty for using the default sink output (stdout) with higher log severity levels. + +### log_severity_level +[Python](https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.log_severity_level) (below) - [C/C++ CreateEnv](https://onnxruntime.ai/docs/api/c/struct_ort_api.html#a22085f699a2d1adb52f809383f475ed1) / [OrtLoggingLevel](https://onnxruntime.ai/docs/api/c/group___global.html#ga1c0fbcf614dbd0e2c272ae1cc04c629c) - [.NET/C#](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.SessionOptions.html#Microsoft_ML_OnnxRuntime_SessionOptions_LogSeverityLevel) +```python +sess_opt = SessionOptions() +sess_opt.log_severity_level = 0 // Verbose +sess = ort.InferenceSession('model.onnx', sess_opt) +``` + +### Note +Note that [log_verbosity_level](https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.log_verbosity_level) is a separate setting and only available in DEBUG custom builds. + +## Tracing About + +Tracing is a super-set of logging in that tracing +- Includes the previously mentioned logging +- Adds tracing events that are more structured than printf style logging +- Can be integrated with a larger tracing eco-system of the OS, such that + - Tracing from multiple systems with ONNX, OS system level, and user-mode software that uses ONNX can be combined + - Timestamps are high resolution and consistent with other traced components + - Can log at high performance with a high number of events / second. + - Events are not logged via stdout, but instead usually via a high performance in memory sink + - Can be enabled dynamically at run-time to investigate issues including in production systems + +Currently, only Tracelogging combined with Windows ETW is supported, although [TraceLogging](https://github.com/microsoft/tracelogging) is cross-platform and support for other OSes instrumentation systems could be added. + +## Tracing - Windows + +There are 2 main ONNX Runtime TraceLogging providers that can be enabled at run-time that can be captured with Windows [ETW](https://learn.microsoft.com/en-us/windows-hardware/test/weg/instrumenting-your-code-with-etw) + +### ONNXRuntimeTraceLoggingProvider +Beginning in ONNX Runtime 1.17 the [ONNXRuntimeTraceLoggingProvider](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/platform/windows/logging/HowToValidateEtwSinkOutput.md) can also be enabled. + +This will dynamically trace with high-performance the previously mentioned LOGS() macro printf logs that were previously only controlled by log_severity_level. A user or developer tracing with this provider will have the log severity level set dynamically with what ETW level they provide at run-time. + +Provider Name: ONNXRuntimeTraceLoggingProvider +Provider GUID: 929DD115-1ECB-4CB5-B060-EBD4983C421D +Keyword: Logs (0x2) keyword per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L83) +Level: 1 (CRITICAL ) through 5 (VERBOSE) per [TraceLoggingLevel](https://learn.microsoft.com/en-us/windows/win32/api/traceloggingprovider/nf-traceloggingprovider-tracelogginglevel#remarks) + +### Microsoft.ML.ONNXRuntime + +The [Microsoft.ML.ONNXRuntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/platform/windows/telemetry.cc#L47) provider provides structured logging. + +Provider Name: Microsoft.ML.ONNXRuntime +Provider GUID: 3a26b1ff-7484-7484-7484-15261f42614d +Keywords: Multiple per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L81) +Level: 1 (CRITICAL ) through 5 (VERBOSE) per [TraceLoggingLevel](https://learn.microsoft.com/en-us/windows/win32/api/traceloggingprovider/nf-traceloggingprovider-tracelogginglevel#remarks) +Note: This provider supports ETW [CaptureState](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/capturestateonsave) (Rundown) for logging state for example when a trace is saved + +ORT 1.17 includes new events logging session options and EP provider options + +#### Profiling + +Microsoft.ML.ONNXRuntime can also output profiling events. That is covered in [profiling](profiling-tools.md) + +### WinML + +WindowsML has it's own tracing providers that be enabled in addition the providers above + +- Microsoft.Windows.WinML - d766d9ff-112c-4dac-9247-241cf99d123f +- Microsoft.Windows.AI.MachineLearning - BCAD6AEE-C08D-4F66-828C-4C43461A033D \ No newline at end of file diff --git a/docs/performance/tune-performance/memory.md b/docs/performance/tune-performance/memory.md index f868df4501f53..54b92c91a69ef 100644 --- a/docs/performance/tune-performance/memory.md +++ b/docs/performance/tune-performance/memory.md @@ -2,7 +2,7 @@ title: Memory consumption grand_parent: Performance parent: Tune performance -nav_order: 2 +nav_order: 3 --- # Reduce memory consumption diff --git a/docs/performance/tune-performance/profiling-tools.md b/docs/performance/tune-performance/profiling-tools.md index 66decc518cd52..fd358aaa8b6fa 100644 --- a/docs/performance/tune-performance/profiling-tools.md +++ b/docs/performance/tune-performance/profiling-tools.md @@ -38,6 +38,34 @@ In both cases, you will get a JSON file which contains the detailed performance * Type chrome://tracing in the address bar * Load the generated JSON file +## Execution Provider (EP) Profiling + +Starting with ONNX 1.17 support has been added to profile EPs or Neural Processing Unit (NPU)s, if that EP supports profiling in it's SDK + +## Qualcomm QNN EP + +As mentioned in the [QNN EP Doc](/docs/execution-providers/QNN-ExecutionProvider.md) profiling is supported + +### Cross-Platform CSV Tracing + +The Qualcomm AI Engine Direct SDK (QNN SDK) supports profiling. QNN will output to CSV in a text format if a dev were to use the QNN SDK directly outside ONNX. To enable equivalent functionality, ONNX mimics this support and outputs the same CSV formatting. + +If profiling_level is provided then ONNX will append log to current working directory a qnn-profiling-data.csv [file](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc#L911) + +### TraceLogging ETW (Windows) Profiling + +As covered in [logging](logging_tracing.md) ONNX supports dynamic enablement of tracing ETW providers. Specifically the following settings. If the Tracelogging provider is enabled and profiling_level was provided, then CSV support is automatically disabled + +- Provider Name: Microsoft.ML.ONNXRuntime +- Provider GUID: 3a26b1ff-7484-7484-7484-15261f42614d +- Keywords: Profiling = 0x100 per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L81) +- Level: + - 5 (VERBOSE) = profiling_level=basic (good details without perf loss) + - greater than 5 = profiling_level=detailed (individual ops are logged with inference perf hit) +- Event: [QNNProfilingEvent](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc#L1083) + +## CUDA Profiling + To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`. To profile ROCm kernels, please add the roctracer library to your PATH and use the onnxruntime binary built from source with `--enable_rocm_profiling`. diff --git a/docs/performance/tune-performance/threading.md b/docs/performance/tune-performance/threading.md index 2b546422c080c..a6603dbadf589 100644 --- a/docs/performance/tune-performance/threading.md +++ b/docs/performance/tune-performance/threading.md @@ -2,7 +2,7 @@ title: Thread management grand_parent: Performance parent: Tune performance -nav_order: 3 +nav_order: 4 --- # Thread management diff --git a/docs/performance/tune-performance/troubleshooting.md b/docs/performance/tune-performance/troubleshooting.md index 51c7c6fd4ac1e..de481b3d38df5 100644 --- a/docs/performance/tune-performance/troubleshooting.md +++ b/docs/performance/tune-performance/troubleshooting.md @@ -2,7 +2,7 @@ title: Troubleshooting grand_parent: Performance parent: Tune performance -nav_order: 5 +nav_order: 6 --- # Troubleshooting performance issues