From e05d040bd4fec552234365cc25e110eed5e7620a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Maximilian=20M=C3=BCller?= <maximilianm@nvidia.com>
Date: Wed, 29 May 2024 11:12:26 +0200
Subject: [PATCH 1/3] packaging doc addition

---
 docs/execution-providers/TensorRT-ExecutionProvider.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/execution-providers/TensorRT-ExecutionProvider.md b/docs/execution-providers/TensorRT-ExecutionProvider.md
index 538eff28e4ccc..0c96071b54308 100644
--- a/docs/execution-providers/TensorRT-ExecutionProvider.md
+++ b/docs/execution-providers/TensorRT-ExecutionProvider.md
@@ -499,6 +499,8 @@ $./onnxruntime_perf_test -e tensorrt -r 1 /model_database/transformer_model/mode
 * One constraint is that the entire model needs to be TRT eligible
 * When running the embedded engine model, the default setting is `trt_ep_context_embed_mode=0`, where the engine cache path is embedded and TRT EP will look for the engine cache on the disk. Alternatively, users can set `trt_ep_context_embed_mode=1`, embedding the entire engine binary data as a string in the model. However, this mode increases initialization time due to ORT graph optimization hashing the long string. Therefore, we recommend using `trt_ep_context_embed_mode=0`.
 * The default name of an embedded engine model will have `_ctx.onnx` appended to the end. Users can specify `trt_ep_context_file_path=my_ep_context_model.onnx` to overwrite this default name.
+* If an embedded engine is used the library **`nvinfer_builder_resource` of TensorRT is not required**, which is by far the largest library. This enables the case of shipping a minimal set of libraries in the case that a fixed set of models is used which are packaged as precompield engine.
+* Besides everything that embedded engines enable to accelerate the load time, they also **enable pacakging an externally compiled engine** using e.g. `trtexec`. A [python script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py) that is capable of packaging such a precompiled engine into an ONNX file is included in the python tools.
 
 ## Performance Tuning
 For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](./../performance/tune-performance/index.md)

From 0433cef4b8834d728d2b035cc5f54344f0739b9a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Maximilian=20M=C3=BCller?= <maximilianm@nvidia.com>
Date: Fri, 31 May 2024 11:31:33 +0200
Subject: [PATCH 2/3] typo fix

---
 docs/execution-providers/TensorRT-ExecutionProvider.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/execution-providers/TensorRT-ExecutionProvider.md b/docs/execution-providers/TensorRT-ExecutionProvider.md
index 0c96071b54308..c2be2fb895826 100644
--- a/docs/execution-providers/TensorRT-ExecutionProvider.md
+++ b/docs/execution-providers/TensorRT-ExecutionProvider.md
@@ -500,7 +500,7 @@ $./onnxruntime_perf_test -e tensorrt -r 1 /model_database/transformer_model/mode
 * When running the embedded engine model, the default setting is `trt_ep_context_embed_mode=0`, where the engine cache path is embedded and TRT EP will look for the engine cache on the disk. Alternatively, users can set `trt_ep_context_embed_mode=1`, embedding the entire engine binary data as a string in the model. However, this mode increases initialization time due to ORT graph optimization hashing the long string. Therefore, we recommend using `trt_ep_context_embed_mode=0`.
 * The default name of an embedded engine model will have `_ctx.onnx` appended to the end. Users can specify `trt_ep_context_file_path=my_ep_context_model.onnx` to overwrite this default name.
 * If an embedded engine is used the library **`nvinfer_builder_resource` of TensorRT is not required**, which is by far the largest library. This enables the case of shipping a minimal set of libraries in the case that a fixed set of models is used which are packaged as precompield engine.
-* Besides everything that embedded engines enable to accelerate the load time, they also **enable pacakging an externally compiled engine** using e.g. `trtexec`. A [python script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py) that is capable of packaging such a precompiled engine into an ONNX file is included in the python tools.
+* Besides everything that embedded engines enable to accelerate the load time, they also **enable packaging an externally compiled engine** using e.g. `trtexec`. A [python script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py) that is capable of packaging such a precompiled engine into an ONNX file is included in the python tools.
 
 ## Performance Tuning
 For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](./../performance/tune-performance/index.md)

From 338fe99f796d3fe261d3c7435e81286baf0d9fbf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Maximilian=20M=C3=BCller?= <maximilianm@nvidia.com>
Date: Mon, 3 Jun 2024 22:10:55 +0200
Subject: [PATCH 3/3] typo

---
 docs/execution-providers/TensorRT-ExecutionProvider.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/execution-providers/TensorRT-ExecutionProvider.md b/docs/execution-providers/TensorRT-ExecutionProvider.md
index c2be2fb895826..6245d64a674ae 100644
--- a/docs/execution-providers/TensorRT-ExecutionProvider.md
+++ b/docs/execution-providers/TensorRT-ExecutionProvider.md
@@ -491,7 +491,7 @@ Note: The example does not specify `trt_engine_cache_path` because `onnxruntime_
 $./onnxruntime_perf_test -e tensorrt -r 1 -i "trt_engine_cache_enable|true trt_dump_ep_context_model|true" /model_database/transformer_model/model.onnx
 ```
 Once the inference is complete, the embedded engine model is saved to disk. User can then run this model just like the original one, but with a significantly quicker session creation time.
-```bask
+```bash
 $./onnxruntime_perf_test -e tensorrt -r 1 /model_database/transformer_model/model_ctx.onnx
 ```