diff --git a/docs/performance/tune-performance/threading.md b/docs/performance/tune-performance/threading.md index 18620eb4add9f..b82c391c460de 100644 --- a/docs/performance/tune-performance/threading.md +++ b/docs/performance/tune-performance/threading.md @@ -201,5 +201,6 @@ int main() { Note that `CreateThreadCustomized` and `JoinThreadCustomized`, once set, will be applied to both ORT intra op and inter op thread pools uniformly. - - +## Usage in custom ops +Since 1.17, custom op developers are entitled to accelerate their code on cpu with ort intra-op thread pool. +Please see the API and example for usage. \ No newline at end of file diff --git a/docs/reference/operators/add-custom-op.md b/docs/reference/operators/add-custom-op.md index 0cb3626efb38f..727fae2a3b491 100644 --- a/docs/reference/operators/add-custom-op.md +++ b/docs/reference/operators/add-custom-op.md @@ -134,6 +134,7 @@ void KernelOne(const Ort::Custom::CudaContext& cuda_ctx, } ``` Details could be found [here](https://github.com/microsoft/onnxruntime/tree/rel-1.16.0/onnxruntime/test/testdata/custom_op_library/cuda). +To facilitate the development, a wide variety of cuda ep resources/configurations are exposed via CudaContext, please see the header and usage for detail. For ROCM, it is like: