diff --git a/docs/performance/tune-performance/profiling-tools.md b/docs/performance/tune-performance/profiling-tools.md index 8ed6501db07e5..3d1973508fb88 100644 --- a/docs/performance/tune-performance/profiling-tools.md +++ b/docs/performance/tune-performance/profiling-tools.md @@ -64,7 +64,7 @@ As covered in [logging](logging_tracing.md) ONNX supports dynamic enablement of - greater than 5 = profiling_level=detailed (individual ops are logged with inference perf hit) - Event: [QNNProfilingEvent](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc#L1083) -## CUDA Profiling +## GPU Profiling To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`. To profile ROCm kernels, please add the roctracer library to your PATH and use the onnxruntime binary built from source with `--enable_rocm_profiling`. @@ -83,4 +83,4 @@ If an operator called multiple kernels during execution, the performance numbers {"cat":"Node", "name":, ...} {"cat":"Kernel", "name":, ...} {"cat":"Kernel", "name":, ...} -``` \ No newline at end of file +```