[doc] add config options for rocm ep (#19643)

Add config options for ROCM ep in docs. ### Motivation and Context Since we don't have any config options described in rocm ep docs, so i add some config descriptions according to cuda ep.
microsoft · Feb 27, 2024 · 4f39cf0 · 4f39cf0
1 parent 69ca776
commit 4f39cf0
Showing 1 changed file with 83 additions and 0 deletions.
diff --git a/docs/execution-providers/ROCm-ExecutionProvider.md b/docs/execution-providers/ROCm-ExecutionProvider.md
@@ -40,6 +40,89 @@ Pre-built binaries of ONNX Runtime with ROCm EP are published for most language
 ## Build
 For build instructions, please see the [BUILD page](../build/eps.md#amd-rocm). 
 
+## Configuration Options
+
+The ROCm Execution Provider supports the following configuration options.
+
+### device_id
+
+The device ID.
+
+Default value: 0
+
+### tunable_op_enable
+
+Set to use TunableOp.
+
+Default value: false
+
+### tunable_op_tuning_enable
+
+Set the TunableOp try to do online tuning.
+
+Default value: false
+
+### user_compute_stream
+
+Defines the compute stream for the inference to run on.
+It implicitly sets the `has_user_compute_stream` option. It cannot be set through `UpdateROCMProviderOptions`.
+This cannot be used in combination with an external allocator.
+
+Example python usage:
+
+```python
+providers = [("ROCMExecutionProvider", {"device_id": torch.cuda.current_device(),
+                                        "user_compute_stream": str(torch.cuda.current_stream().cuda_stream)})]
+sess_options = ort.SessionOptions()
+sess = ort.InferenceSession("my_model.onnx", sess_options=sess_options, providers=providers)
+```
+
+To take advantage of user compute stream, it is recommended to
+use [I/O Binding](../api/python/api_summary.html) to bind inputs and outputs to tensors in device.
+
+### do_copy_in_default_stream
+
+Whether to do copies in the default stream or use separate streams. The recommended setting is true. If false, there are
+race conditions and possibly better performance.
+
+Default value: true
+
+### gpu_mem_limit
+
+The size limit of the device memory arena in bytes. This size limit is only for the execution provider's arena. The
+total device memory usage may be higher.
+s: max value of C++ size_t type (effectively unlimited)
+
+_Note:_ Will be over-ridden by contents of `default_memory_arena_cfg` (if specified)
+
+### arena_extend_strategy
+
+The strategy for extending the device memory arena.
+
+ Value                | Description                                                                  
+----------------------|------------------------------------------------------------------------------
+ kNextPowerOfTwo (0)  | subsequent extensions extend by larger amounts (multiplied by powers of two) 
+ kSameAsRequested (1) | extend by the requested amount                                               
+
+Default value: kNextPowerOfTwo
+
+_Note:_ Will be over-ridden by contents of `default_memory_arena_cfg` (if specified)
+
+### gpu_external_[alloc|free|empty_cache]
+
+gpu_external_* is used to pass external allocators.
+Example python usage:
+
+```python
+from onnxruntime.training.ortmodule.torch_cpp_extensions import torch_gpu_allocator
+
+provider_option_map["gpu_external_alloc"] = str(torch_gpu_allocator.gpu_caching_allocator_raw_alloc_address())
+provider_option_map["gpu_external_free"] = str(torch_gpu_allocator.gpu_caching_allocator_raw_delete_address())
+provider_option_map["gpu_external_empty_cache"] = str(torch_gpu_allocator.gpu_caching_allocator_empty_cache_address())
+```
+
+Default value: 0
+
 ## Usage
 
 ### C/C++