From 7e64928b060b8eac133733c3d8fb18e73472c78a Mon Sep 17 00:00:00 2001
From: ivberg <ivberg@microsoft.com>
Date: Wed, 7 Feb 2024 10:47:15 -0800
Subject: [PATCH] Added docs for ONNX 1.17 covering logging, tracing, and QNN
 EP Profiling (#19428)

### Description
Added docs for ONNX 1.17 covering logging, tracing, and QNN EP Profiling

### Motivation and Context
- ONNX Logging has not been documented
- ONNX Tracing with Windows has barely been documented
- ONNX 1.17 has new tracing and QNN EP Profiling

PRs: #16259,  #18201, #18882, #19397
---
 docs/build/custom.md                          |  8 +-
 docs/build/eps.md                             |  2 +-
 .../QNN-ExecutionProvider.md                  |  3 +
 .../performance/tune-performance/iobinding.md |  2 +-
 .../tune-performance/logging_tracing.md       | 95 +++++++++++++++++++
 docs/performance/tune-performance/memory.md   |  2 +-
 .../tune-performance/profiling-tools.md       | 30 +++++-
 .../performance/tune-performance/threading.md |  2 +-
 .../tune-performance/troubleshooting.md       |  2 +-
 docs/tutorials/csharp/csharp-gpu.md           |  2 +-
 10 files changed, 137 insertions(+), 11 deletions(-)
 create mode 100644 docs/performance/tune-performance/logging_tracing.md

diff --git a/docs/build/custom.md b/docs/build/custom.md
index 93e1c1bfa221e..e270feac445a1 100644
--- a/docs/build/custom.md
+++ b/docs/build/custom.md
@@ -165,7 +165,7 @@ _[This section is coming soon]_
 
 ### iOS
 
-To produce pods for an iOS build, use the [build_and_assemble_ios_pods.py](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/apple/build_and_assemble_ios_pods.py) script from the ONNX Runtime repo.
+To produce pods for an iOS build, use the [build_and_assemble_apple_pods.py](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/apple/build_and_assemble_apple_pods.py) script from the ONNX Runtime repo.
 
 1. Check out the version of ONNX Runtime you want to use.
 
@@ -174,7 +174,7 @@ To produce pods for an iOS build, use the [build_and_assemble_ios_pods.py](https
     For example:
 
     ```bash
-    python3 tools/ci_build/github/apple/build_and_assemble_ios_pods.py \
+    python3 tools/ci_build/github/apple/build_and_assemble_apple_pods.py \
       --staging-dir /path/to/staging/dir \
       --include-ops-by-config /path/to/ops.config \
       --build-settings-file /path/to/build_settings.json
@@ -186,14 +186,14 @@ To produce pods for an iOS build, use the [build_and_assemble_ios_pods.py](https
 
     The reduced set of ops in the custom build is specified with the file provided to the `--include_ops_by_config` option. See the current op config used by the pre-built mobile package at [tools/ci_build/github/android/mobile_package.required_operators.config](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/android/mobile_package.required_operators.config) (Android and iOS pre-built mobile packages share the same config file). You can use this file directly.
 
-    The default package does not include the training APIs. To create a training package, add `--enable_training_apis` in the build options file provided to `--build-settings-file` and add the `--variant Training` option when calling `build_and_assemble_ios_pods.py`.
+    The default package does not include the training APIs. To create a training package, add `--enable_training_apis` in the build options file provided to `--build-settings-file` and add the `--variant Training` option when calling `build_and_assemble_apple_pods.py`.
     
     For example:
     
     ```bash
     # /path/to/build_settings.json is a file that includes the `--enable_training_apis` option
     
-    python3 tools/ci_build/github/apple/build_and_assemble_ios_pods.py \
+    python3 tools/ci_build/github/apple/build_and_assemble_apple_pods.py \
       --staging-dir /path/to/staging/dir \
       --include-ops-by-config /path/to/ops.config \
       --build-settings-file /path/to/build_settings.json \
diff --git a/docs/build/eps.md b/docs/build/eps.md
index 2c6e2c894824a..410944953d009 100644
--- a/docs/build/eps.md
+++ b/docs/build/eps.md
@@ -104,7 +104,7 @@ See more information on the TensorRT Execution Provider [here](../execution-prov
    * The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home` parameter. The CUDA path should contain `bin`, `include` and `lib` directories.
    * The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found.
    * The path to the cuDNN installation (path to cudnn bin/include/lib) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home` parameter.
-     * On Windows, cuDNN requires [zlibwapi.dll](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-zlib-windows). Feel free to place this dll under `path_to_cudnn/bin`  
+     * On Windows, cuDNN requires [zlibwapi.dll](https://docs.nvidia.com/deeplearning/cudnn/installation/windows.html). Feel free to place this dll under `path_to_cudnn/bin`  
  * Follow [instructions for installing TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)
    * The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 8.6.
    * The path to TensorRT installation must be provided via the `--tensorrt_home` parameter.
diff --git a/docs/execution-providers/QNN-ExecutionProvider.md b/docs/execution-providers/QNN-ExecutionProvider.md
index 6ad125c231bef..61d0aae21c50b 100644
--- a/docs/execution-providers/QNN-ExecutionProvider.md
+++ b/docs/execution-providers/QNN-ExecutionProvider.md
@@ -55,6 +55,9 @@ The QNN Execution Provider supports a number of configuration options. These pro
 |'basic'||
 |'detailed'||
 
+See [profiling-tools](../performance/tune-performance/profiling-tools.md) for more info on profiling  
+Alternatively to setting profiling_level at compile time, profiling can be enabled dynamically with ETW (Windows). See [tracing](../performance/tune-performance/logging_tracing.md) for more details
+
 |`"rpc_control_latency"`|Description|
 |---|---|
 |microseconds (string)|allows client to set up RPC control latency in microseconds|
diff --git a/docs/performance/tune-performance/iobinding.md b/docs/performance/tune-performance/iobinding.md
index baa97420d185f..d8c433fa9d7ad 100644
--- a/docs/performance/tune-performance/iobinding.md
+++ b/docs/performance/tune-performance/iobinding.md
@@ -2,7 +2,7 @@
 title: I/O Binding
 grand_parent: Performance
 parent: Tune performance
-nav_order: 4
+nav_order: 5
 ---
 
 # I/O Binding
diff --git a/docs/performance/tune-performance/logging_tracing.md b/docs/performance/tune-performance/logging_tracing.md
new file mode 100644
index 0000000000000..ff0aa7684befd
--- /dev/null
+++ b/docs/performance/tune-performance/logging_tracing.md
@@ -0,0 +1,95 @@
+---
+title: Logging & Tracing
+grand_parent: Performance
+parent: Tune performance
+nav_order: 2
+---
+
+# Logging & Tracing
+
+## Contents
+{: .no_toc }
+
+* TOC placeholder
+{:toc}
+
+
+## Developer Logging
+
+ONNX Runtime has built-in cross-platform internal [printf style logging LOGS()](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/common/logging/macros.h). This logging is available to configure in *production builds* for a dev **using the API**.
+
+There will likely be a performance penalty for using the default sink output (stdout) with higher log severity levels.
+
+### log_severity_level
+[Python](https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.log_severity_level) (below) - [C/C++ CreateEnv](https://onnxruntime.ai/docs/api/c/struct_ort_api.html#a22085f699a2d1adb52f809383f475ed1) / [OrtLoggingLevel](https://onnxruntime.ai/docs/api/c/group___global.html#ga1c0fbcf614dbd0e2c272ae1cc04c629c) - [.NET/C#](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.SessionOptions.html#Microsoft_ML_OnnxRuntime_SessionOptions_LogSeverityLevel)
+```python
+sess_opt = SessionOptions()
+sess_opt.log_severity_level = 0 // Verbose
+sess = ort.InferenceSession('model.onnx', sess_opt)
+```
+
+### Note
+Note that [log_verbosity_level](https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.log_verbosity_level) is a separate setting and only available in DEBUG custom builds.
+
+## Tracing About
+
+Tracing is a super-set of logging in that tracing 
+- Includes the previously mentioned logging
+- Adds tracing events that are more structured than printf style logging
+- Can be integrated with a larger tracing eco-system of the OS, such that
+  - Tracing from multiple systems with ONNX, OS system level, and user-mode software that uses ONNX can be combined
+  - Timestamps are high resolution and consistent with other traced components
+  - Can log at high performance with a high number of events / second.
+  - Events are not logged via stdout, but instead usually via a high performance in memory sink
+  - Can be enabled dynamically at run-time to investigate issues including in production systems
+
+Currently, only Tracelogging combined with Windows ETW is supported, although [TraceLogging](https://github.com/microsoft/tracelogging) is cross-platform and support for other OSes instrumentation systems could be added.
+
+## Tracing - Windows
+
+There are 2 main ONNX Runtime TraceLogging providers that can be enabled at run-time that can be captured with Windows [ETW](https://learn.microsoft.com/en-us/windows-hardware/test/weg/instrumenting-your-code-with-etw)
+
+### Quickstart Tracing with WPR
+
+On Windows, you can use Windows Performance Recorder ([WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/wpr-command-line-options)) to capture a trace. The 2 providers covered below are already configured in these WPR profiles.
+
+- Download [ort.wprp](https://github.com/microsoft/onnxruntime/blob/main/ort.wprp) and [etw_provider.wprp](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/platform/windows/logging/etw_provider.wprp) (these could also be combined later)
+
+```dos
+wpr -start ort.wprp -start etw_provider.wprp
+echo Repro the issue allowing ONNX to run
+wpr -stop onnx.etl -compress
+```
+
+### ONNXRuntimeTraceLoggingProvider
+Beginning in ONNX Runtime 1.17 the [ONNXRuntimeTraceLoggingProvider](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/platform/windows/logging/HowToValidateEtwSinkOutput.md) can also be enabled.
+
+This will dynamically trace with high-performance the previously mentioned LOGS() macro printf logs that were previously only controlled by log_severity_level. A user or developer tracing with this provider will have the log severity level set dynamically with what ETW level they provide at run-time.
+
+Provider Name: ONNXRuntimeTraceLoggingProvider  
+Provider GUID: 929DD115-1ECB-4CB5-B060-EBD4983C421D  
+Keyword: Logs (0x2) keyword per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L83)  
+Level: 1 (CRITICAL ) through 5 (VERBOSE) per [TraceLoggingLevel](https://learn.microsoft.com/en-us/windows/win32/api/traceloggingprovider/nf-traceloggingprovider-tracelogginglevel#remarks)  
+
+### Microsoft.ML.ONNXRuntime
+
+The [Microsoft.ML.ONNXRuntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/platform/windows/telemetry.cc#L47) provider provides structured logging.  
+
+Provider Name: Microsoft.ML.ONNXRuntime  
+Provider GUID: 3a26b1ff-7484-7484-7484-15261f42614d  
+Keywords: Multiple per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L81)  
+Level: 1 (CRITICAL ) through 5 (VERBOSE) per [TraceLoggingLevel](https://learn.microsoft.com/en-us/windows/win32/api/traceloggingprovider/nf-traceloggingprovider-tracelogginglevel#remarks)  
+Note: This provider supports ETW [CaptureState](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/capturestateonsave) (Rundown) for logging state for example when a trace is saved
+
+ORT 1.17 includes new events logging session options and EP provider options
+
+#### Profiling
+
+Microsoft.ML.ONNXRuntime can also output profiling events. That is covered in [profiling](profiling-tools.md)
+
+### WinML
+
+WindowsML has it's own tracing providers that be enabled in addition the providers above
+
+- Microsoft.Windows.WinML - d766d9ff-112c-4dac-9247-241cf99d123f
+- Microsoft.Windows.AI.MachineLearning - BCAD6AEE-C08D-4F66-828C-4C43461A033D
\ No newline at end of file
diff --git a/docs/performance/tune-performance/memory.md b/docs/performance/tune-performance/memory.md
index f868df4501f53..54b92c91a69ef 100644
--- a/docs/performance/tune-performance/memory.md
+++ b/docs/performance/tune-performance/memory.md
@@ -2,7 +2,7 @@
 title: Memory consumption
 grand_parent: Performance
 parent: Tune performance
-nav_order: 2
+nav_order: 3
 ---
 
 # Reduce memory consumption
diff --git a/docs/performance/tune-performance/profiling-tools.md b/docs/performance/tune-performance/profiling-tools.md
index 66decc518cd52..3d1973508fb88 100644
--- a/docs/performance/tune-performance/profiling-tools.md
+++ b/docs/performance/tune-performance/profiling-tools.md
@@ -38,6 +38,34 @@ In both cases, you will get a JSON file which contains the detailed performance
   * Type chrome://tracing in the address bar
   * Load the generated JSON file
 
+## Execution Provider (EP) Profiling
+
+Starting with ONNX 1.17 support has been added to profile EPs or Neural Processing Unit (NPU)s, if that EP supports profiling in it's SDK
+
+## Qualcomm QNN EP
+
+As mentioned in the [QNN EP Doc](../../execution-providers/QNN-ExecutionProvider.md) profiling is supported
+
+### Cross-Platform CSV Tracing
+
+The Qualcomm AI Engine Direct SDK (QNN SDK) supports profiling. QNN will output to CSV in a text format if a dev were to use the QNN SDK directly outside ONNX. To enable equivalent functionality, ONNX mimics this support and outputs the same CSV formatting.
+
+If profiling_level is provided then ONNX will append log to current working directory a qnn-profiling-data.csv [file](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc#L911)
+
+### TraceLogging ETW (Windows) Profiling
+
+As covered in [logging](logging_tracing.md) ONNX supports dynamic enablement of tracing ETW providers. Specifically the following settings. If the Tracelogging provider is enabled and profiling_level was provided, then CSV support is automatically disabled
+
+- Provider Name: Microsoft.ML.ONNXRuntime  
+- Provider GUID: 3a26b1ff-7484-7484-7484-15261f42614d  
+- Keywords: Profiling = 0x100  per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L81)  
+- Level: 
+  - 5 (VERBOSE) = profiling_level=basic (good details without perf loss)
+  - greater than 5 = profiling_level=detailed (individual ops are logged with inference perf hit)  
+- Event: [QNNProfilingEvent](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc#L1083)
+
+## GPU Profiling
+
 To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`.
 To profile ROCm kernels, please add the roctracer library to your PATH and use the onnxruntime binary built from source with `--enable_rocm_profiling`. 
 
@@ -55,4 +83,4 @@ If an operator called multiple kernels during execution, the performance numbers
 {"cat":"Node", "name":<name of the node>, ...}
 {"cat":"Kernel", "name":<name of the kernel called first>, ...}
 {"cat":"Kernel", "name":<name of the kernel called next>, ...}
-```
\ No newline at end of file
+```
diff --git a/docs/performance/tune-performance/threading.md b/docs/performance/tune-performance/threading.md
index 2b546422c080c..a6603dbadf589 100644
--- a/docs/performance/tune-performance/threading.md
+++ b/docs/performance/tune-performance/threading.md
@@ -2,7 +2,7 @@
 title: Thread management
 grand_parent: Performance
 parent: Tune performance
-nav_order: 3
+nav_order: 4
 ---
 
 # Thread management
diff --git a/docs/performance/tune-performance/troubleshooting.md b/docs/performance/tune-performance/troubleshooting.md
index 51c7c6fd4ac1e..de481b3d38df5 100644
--- a/docs/performance/tune-performance/troubleshooting.md
+++ b/docs/performance/tune-performance/troubleshooting.md
@@ -2,7 +2,7 @@
 title: Troubleshooting
 grand_parent: Performance
 parent: Tune performance
-nav_order: 5
+nav_order: 6
 ---
 
 # Troubleshooting performance issues
diff --git a/docs/tutorials/csharp/csharp-gpu.md b/docs/tutorials/csharp/csharp-gpu.md
index a7dd199073f7a..3f62fdd649781 100644
--- a/docs/tutorials/csharp/csharp-gpu.md
+++ b/docs/tutorials/csharp/csharp-gpu.md
@@ -31,7 +31,7 @@ See this table for supported versions:
 NOTE: Full table can be found [here](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements)
 
 
-- Follow section [2. Installing cuDNN on Windows](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-windows). NOTE: Skip step 5 in section 2.3 on updating Visual Studio settings, this is only for C++ projects.
+- Follow section [2. Installing cuDNN on Windows](https://docs.nvidia.com/deeplearning/cudnn/installation/windows.html). NOTE: Skip step 5 in section 2.3 on updating Visual Studio settings, this is only for C++ projects.
 
 - Restart your computer and verify the installation by running the following command or in python with PyTorch: