Skip to content

Commit

Permalink
[doc] add config options for rocm ep (#19643)
Browse files Browse the repository at this point in the history
Add config options for ROCM ep in docs.

### Motivation and Context
Since we don't have any config options described in rocm ep docs, so i
add some config descriptions according to cuda ep.
  • Loading branch information
kailums authored Feb 27, 2024
1 parent 69ca776 commit 4f39cf0
Showing 1 changed file with 83 additions and 0 deletions.
83 changes: 83 additions & 0 deletions docs/execution-providers/ROCm-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,89 @@ Pre-built binaries of ONNX Runtime with ROCm EP are published for most language
## Build
For build instructions, please see the [BUILD page](../build/eps.md#amd-rocm).

## Configuration Options

The ROCm Execution Provider supports the following configuration options.

### device_id

The device ID.

Default value: 0

### tunable_op_enable

Set to use TunableOp.

Default value: false

### tunable_op_tuning_enable

Set the TunableOp try to do online tuning.

Default value: false

### user_compute_stream

Defines the compute stream for the inference to run on.
It implicitly sets the `has_user_compute_stream` option. It cannot be set through `UpdateROCMProviderOptions`.
This cannot be used in combination with an external allocator.

Example python usage:

```python
providers = [("ROCMExecutionProvider", {"device_id": torch.cuda.current_device(),
"user_compute_stream": str(torch.cuda.current_stream().cuda_stream)})]
sess_options = ort.SessionOptions()
sess = ort.InferenceSession("my_model.onnx", sess_options=sess_options, providers=providers)
```

To take advantage of user compute stream, it is recommended to
use [I/O Binding](../api/python/api_summary.html) to bind inputs and outputs to tensors in device.

### do_copy_in_default_stream

Whether to do copies in the default stream or use separate streams. The recommended setting is true. If false, there are
race conditions and possibly better performance.

Default value: true

### gpu_mem_limit

The size limit of the device memory arena in bytes. This size limit is only for the execution provider's arena. The
total device memory usage may be higher.
s: max value of C++ size_t type (effectively unlimited)

_Note:_ Will be over-ridden by contents of `default_memory_arena_cfg` (if specified)

### arena_extend_strategy

The strategy for extending the device memory arena.

Value | Description
----------------------|------------------------------------------------------------------------------
kNextPowerOfTwo (0) | subsequent extensions extend by larger amounts (multiplied by powers of two)
kSameAsRequested (1) | extend by the requested amount

Default value: kNextPowerOfTwo

_Note:_ Will be over-ridden by contents of `default_memory_arena_cfg` (if specified)

### gpu_external_[alloc|free|empty_cache]

gpu_external_* is used to pass external allocators.
Example python usage:

```python
from onnxruntime.training.ortmodule.torch_cpp_extensions import torch_gpu_allocator

provider_option_map["gpu_external_alloc"] = str(torch_gpu_allocator.gpu_caching_allocator_raw_alloc_address())
provider_option_map["gpu_external_free"] = str(torch_gpu_allocator.gpu_caching_allocator_raw_delete_address())
provider_option_map["gpu_external_empty_cache"] = str(torch_gpu_allocator.gpu_caching_allocator_empty_cache_address())
```

Default value: 0

## Usage

### C/C++
Expand Down

0 comments on commit 4f39cf0

Please sign in to comment.