Skip to content

Commit

Permalink
[Doc] CUDA build options (#19678)
Browse files Browse the repository at this point in the history
  • Loading branch information
tianleiwu authored Feb 29, 2024
1 parent 53a2b2b commit b6a1108
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 10 deletions.
25 changes: 16 additions & 9 deletions docs/build/eps.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,33 +46,40 @@ The onnxruntime code will look for the provider shared libraries in the same loc
{: .no_toc }

* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) according to the [version compatibility matrix](../execution-providers/CUDA-ExecutionProvider.md#requirements).
* The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home` parameter.
* The path to the cuDNN installation (include the `cuda` folder in the path) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home` parameter. The cuDNN path should contain `bin`, `include` and `lib` directories.
* The path to the CUDA installation must be provided via the CUDA_HOME environment variable, or the `--cuda_home` parameter.
* The path to the cuDNN installation (include the `cuda` folder in the path) must be provided via the CUDNN_HOME environment variable, or `--cudnn_home` parameter. The cuDNN path should contain `bin`, `include` and `lib` directories.
* The path to the cuDNN bin directory must be added to the PATH environment variable so that cudnn64_8.dll is found.


### Build Instructions
{: .no_toc }

With an additional CMake argument the CUDA EP can be compiled with additional NHWC ops.
This option is not enabled by default due to the small amount of supported NHWC operators.
Over time more operators will be added but for now append `--cmake_extra_defines onnxruntime_USE_CUDA_NHWC_OPS=ON` to below build scripts to compile with NHWC operators.
Another very helpful CMake build option is to build with NVTX support (`onnxruntime_ENABLE_NVTX_PROFILE=ON`) that will enable much easier profiling using [Nsight Systems](https://developer.nvidia.com/nsight-systems) and correlates CUDA kernels with their actual ONNX operator.

#### Windows

```
.\build.bat --use_cuda --cudnn_home <cudnn home path> --cuda_home <cuda home path>
```

#### Linux

```
./build.sh --use_cuda --cudnn_home <cudnn home path> --cuda_home <cuda home path>
```

A Dockerfile is available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#cuda).

### Build Options

To specify GPU architectures (see [Compute Capability](https://developer.nvidia.com/cuda-gpus)), you can append parameters like `--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80;86;89`.

With `--cmake_extra_defines onnxruntime_USE_CUDA_NHWC_OPS=ON`, the CUDA EP can be compiled with additional NHWC ops. This option is not enabled by default due to the small amount of supported NHWC operators.

Another very helpful CMake build option is to build with NVTX support (`--cmake_extra_defines onnxruntime_ENABLE_NVTX_PROFILE=ON`) that will enable much easier profiling using [Nsight Systems](https://developer.nvidia.com/nsight-systems) and correlates CUDA kernels with their actual ONNX operator.

`--enable_cuda_line_info` or `--cmake_extra_defines onnxruntime_ENABLE_CUDA_LINE_NUMBER_INFO=ON` will enable [NVCC generation of line-number information for device code](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#generate-line-info-lineinfo). It might be helpful when you run [Compute Sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) tools on CUDA kernels.

If your Windows machine has multiple versions of CUDA installed and you want to use an older version of CUDA, you need append parameters like `--cuda_version <cuda version>`.

When your build machine has many CPU cores and less than 64 GB memory, there is chance of out of memory error like `nvcc error : 'cicc' died due to signal 9`. The solution is to limit number of parallel NVCC threads with parameters like `--parallel 4 --nvcc_threads 1`.

### Notes on older versions of ONNX Runtime, CUDA and Visual Studio
{: .no_toc }

Expand Down
2 changes: 1 addition & 1 deletion docs/build/inferencing.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ Note: unit tests will be skipped due to the incompatible CPU instruction set whe
| OS | Supports CPU | Supports GPU| Notes |
|-------------|:------------:|:------------:|------------------------------------|
|Windows 10 | YES | YES | VS2019 through the latest VS2015 are supported |
|Windows 10 | YES | YES | VS2019 through the latest VS2022 are supported |
|Windows 10 <br/> Subsystem for Linux | YES | NO | |
|Ubuntu 20.x/22.x | YES | YES | Also supported on ARM32v7 (experimental) |
|CentOS 7/8/9 | YES | YES | Also supported on ARM32v7 (experimental) |
Expand Down
3 changes: 3 additions & 0 deletions docs/execution-providers/CUDA-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ The CUDA Execution Provider enables hardware accelerated computation on Nvidia C
Pre-built binaries of ONNX Runtime with CUDA EP are published for most language bindings. Please
reference [Install ORT](../install).

## Build from source
See [Build instructions](../build/eps.html#cuda).

## Requirements

Please reference table below for official GPU packages dependencies for the ONNX Runtime inferencing package. Note that
Expand Down

0 comments on commit b6a1108

Please sign in to comment.