From b45639835897413c02796c732e54c54381ad6ce3 Mon Sep 17 00:00:00 2001
From: Randy Shuai <rashuai@microsoft.com>
Date: Tue, 16 Jan 2024 16:45:21 -0800
Subject: [PATCH 1/4] doc tp API and new cuda resources

---
 docs/performance/tune-performance/threading.md | 5 +++--
 docs/reference/operators/add-custom-op.md      | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/performance/tune-performance/threading.md b/docs/performance/tune-performance/threading.md
index 18620eb4add9f..b82c391c460de 100644
--- a/docs/performance/tune-performance/threading.md
+++ b/docs/performance/tune-performance/threading.md
@@ -201,5 +201,6 @@ int main() {
 
 Note that `CreateThreadCustomized` and `JoinThreadCustomized`, once  set, will be applied to both ORT intra op and inter op thread pools uniformly.
 
-
-
+## Usage in custom ops
+Since 1.17, custom op developers are entitled to accelerate their code on cpu with ort intra-op thread pool.
+Please see the API and example for usage.
\ No newline at end of file
diff --git a/docs/reference/operators/add-custom-op.md b/docs/reference/operators/add-custom-op.md
index 0cb3626efb38f..727fae2a3b491 100644
--- a/docs/reference/operators/add-custom-op.md
+++ b/docs/reference/operators/add-custom-op.md
@@ -134,6 +134,7 @@ void KernelOne(const Ort::Custom::CudaContext& cuda_ctx,
 }
 ```
 Details could be found [here](https://github.com/microsoft/onnxruntime/tree/rel-1.16.0/onnxruntime/test/testdata/custom_op_library/cuda).
+To facilitate the development, a wide variety of cuda ep resources/configurations are exposed via CudaContext, please see the header and usage for detail.
 
 For ROCM, it is like:
 

From 982a40becef97973e8764cc55903f649f0fef942 Mon Sep 17 00:00:00 2001
From: Randy Shuai <rashuai@microsoft.com>
Date: Tue, 16 Jan 2024 18:15:50 -0800
Subject: [PATCH 2/4] add links

---
 docs/performance/tune-performance/threading.md | 4 ++--
 docs/reference/operators/add-custom-op.md      | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/performance/tune-performance/threading.md b/docs/performance/tune-performance/threading.md
index b82c391c460de..f90f5ca0f48e3 100644
--- a/docs/performance/tune-performance/threading.md
+++ b/docs/performance/tune-performance/threading.md
@@ -202,5 +202,5 @@ int main() {
 Note that `CreateThreadCustomized` and `JoinThreadCustomized`, once  set, will be applied to both ORT intra op and inter op thread pools uniformly.
 
 ## Usage in custom ops
-Since 1.17, custom op developers are entitled to accelerate their code on cpu with ort intra-op thread pool.
-Please see the API and example for usage.
\ No newline at end of file
+Since 1.17, custom op developers are entitled to parallelize their cpu code by ort intra-op thread pool.
+Please refer to the [API](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/include/onnxruntime/core/session/onnxruntime_cxx_inline.h#L1681), and [example](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cpu/cpu_ops.cc#L87) for usage.
\ No newline at end of file
diff --git a/docs/reference/operators/add-custom-op.md b/docs/reference/operators/add-custom-op.md
index 727fae2a3b491..0dc362b9e0dac 100644
--- a/docs/reference/operators/add-custom-op.md
+++ b/docs/reference/operators/add-custom-op.md
@@ -133,8 +133,8 @@ void KernelOne(const Ort::Custom::CudaContext& cuda_ctx,
   cuda_add(Z.NumberOfElement(), z_raw, X.Data(), Y.Data(), cuda_ctx.cuda_stream); // launch a kernel inside
 }
 ```
-Details could be found [here](https://github.com/microsoft/onnxruntime/tree/rel-1.16.0/onnxruntime/test/testdata/custom_op_library/cuda).
-To facilitate the development, a wide variety of cuda ep resources/configurations are exposed via CudaContext, please see the header and usage for detail.
+Full code could be found [here](https://github.com/microsoft/onnxruntime/tree/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cuda).
+To facilitate the development, a wide variety of cuda ep resources and configurations are exposed via CudaContext, please refer to the [header](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/include/onnxruntime/core/providers/cuda/cuda_resource.h#L8) for detail.
 
 For ROCM, it is like:
 

From f987f4378a46243363e6dcdaacd820800d4976f7 Mon Sep 17 00:00:00 2001
From: Randy Shuai <rashuai@microsoft.com>
Date: Tue, 16 Jan 2024 18:21:16 -0800
Subject: [PATCH 3/4] tune links

---
 docs/performance/tune-performance/threading.md | 5 +++--
 docs/reference/operators/add-custom-op.md      | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/docs/performance/tune-performance/threading.md b/docs/performance/tune-performance/threading.md
index f90f5ca0f48e3..2b546422c080c 100644
--- a/docs/performance/tune-performance/threading.md
+++ b/docs/performance/tune-performance/threading.md
@@ -202,5 +202,6 @@ int main() {
 Note that `CreateThreadCustomized` and `JoinThreadCustomized`, once  set, will be applied to both ORT intra op and inter op thread pools uniformly.
 
 ## Usage in custom ops
-Since 1.17, custom op developers are entitled to parallelize their cpu code by ort intra-op thread pool.
-Please refer to the [API](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/include/onnxruntime/core/session/onnxruntime_cxx_inline.h#L1681), and [example](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cpu/cpu_ops.cc#L87) for usage.
\ No newline at end of file
+Since 1.17, custom op developers are entitled to parallelize their cpu code with ort intra-op thread pool.
+
+Please refer to the [API](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/include/onnxruntime/core/session/onnxruntime_c_api.h#L4543), and [example](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cpu/cpu_ops.cc#L87) for usage.
\ No newline at end of file
diff --git a/docs/reference/operators/add-custom-op.md b/docs/reference/operators/add-custom-op.md
index 0dc362b9e0dac..981fe86eab9b0 100644
--- a/docs/reference/operators/add-custom-op.md
+++ b/docs/reference/operators/add-custom-op.md
@@ -133,8 +133,9 @@ void KernelOne(const Ort::Custom::CudaContext& cuda_ctx,
   cuda_add(Z.NumberOfElement(), z_raw, X.Data(), Y.Data(), cuda_ctx.cuda_stream); // launch a kernel inside
 }
 ```
-Full code could be found [here](https://github.com/microsoft/onnxruntime/tree/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cuda).
-To facilitate the development, a wide variety of cuda ep resources and configurations are exposed via CudaContext, please refer to the [header](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/include/onnxruntime/core/providers/cuda/cuda_resource.h#L8) for detail.
+Full example could be found [here](https://github.com/microsoft/onnxruntime/tree/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cuda).
+
+Note - to facilitate the development, a wide variety of cuda ep resources and configurations are exposed via CudaContext, please refer to the [header](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/include/onnxruntime/core/providers/cuda/cuda_resource.h#L8) for detail.
 
 For ROCM, it is like:
 

From d58a1b1fe32c6ffd175b45575f02c2f576da5317 Mon Sep 17 00:00:00 2001
From: Randy Shuai <rashuai@microsoft.com>
Date: Tue, 16 Jan 2024 18:24:23 -0800
Subject: [PATCH 4/4] tune grammar

---
 docs/reference/operators/add-custom-op.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/docs/reference/operators/add-custom-op.md b/docs/reference/operators/add-custom-op.md
index 981fe86eab9b0..b4b43b2324eb5 100644
--- a/docs/reference/operators/add-custom-op.md
+++ b/docs/reference/operators/add-custom-op.md
@@ -133,9 +133,7 @@ void KernelOne(const Ort::Custom::CudaContext& cuda_ctx,
   cuda_add(Z.NumberOfElement(), z_raw, X.Data(), Y.Data(), cuda_ctx.cuda_stream); // launch a kernel inside
 }
 ```
-Full example could be found [here](https://github.com/microsoft/onnxruntime/tree/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cuda).
-
-Note - to facilitate the development, a wide variety of cuda ep resources and configurations are exposed via CudaContext, please refer to the [header](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/include/onnxruntime/core/providers/cuda/cuda_resource.h#L8) for detail.
+Full example could be found [here](https://github.com/microsoft/onnxruntime/tree/rel-1.17.0/onnxruntime/test/testdata/custom_op_library/cuda). To further facilitate development, a wide variety of cuda ep resources and configurations are exposed via CudaContext, please refer to the [header](https://github.com/microsoft/onnxruntime/blob/rel-1.17.0/include/onnxruntime/core/providers/cuda/cuda_resource.h#L8) for detail.
 
 For ROCM, it is like: