Skip to content

Commit

Permalink
Fix GPU profiling per PR
Browse files Browse the repository at this point in the history
  • Loading branch information
ivberg authored Feb 7, 2024
1 parent d95c0f4 commit c87a05c
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/performance/tune-performance/profiling-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ As covered in [logging](logging_tracing.md) ONNX supports dynamic enablement of
- greater than 5 = profiling_level=detailed (individual ops are logged with inference perf hit)
- Event: [QNNProfilingEvent](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc#L1083)

## CUDA Profiling
## GPU Profiling

To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`.
To profile ROCm kernels, please add the roctracer library to your PATH and use the onnxruntime binary built from source with `--enable_rocm_profiling`.
Expand All @@ -83,4 +83,4 @@ If an operator called multiple kernels during execution, the performance numbers
{"cat":"Node", "name":<name of the node>, ...}
{"cat":"Kernel", "name":<name of the kernel called first>, ...}
{"cat":"Kernel", "name":<name of the kernel called next>, ...}
```
```

0 comments on commit c87a05c

Please sign in to comment.