From e05d040bd4fec552234365cc25e110eed5e7620a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Maximilian=20M=C3=BCller?= Date: Wed, 29 May 2024 11:12:26 +0200 Subject: [PATCH 1/3] packaging doc addition --- docs/execution-providers/TensorRT-ExecutionProvider.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/execution-providers/TensorRT-ExecutionProvider.md b/docs/execution-providers/TensorRT-ExecutionProvider.md index 538eff28e4ccc..0c96071b54308 100644 --- a/docs/execution-providers/TensorRT-ExecutionProvider.md +++ b/docs/execution-providers/TensorRT-ExecutionProvider.md @@ -499,6 +499,8 @@ $./onnxruntime_perf_test -e tensorrt -r 1 /model_database/transformer_model/mode * One constraint is that the entire model needs to be TRT eligible * When running the embedded engine model, the default setting is `trt_ep_context_embed_mode=0`, where the engine cache path is embedded and TRT EP will look for the engine cache on the disk. Alternatively, users can set `trt_ep_context_embed_mode=1`, embedding the entire engine binary data as a string in the model. However, this mode increases initialization time due to ORT graph optimization hashing the long string. Therefore, we recommend using `trt_ep_context_embed_mode=0`. * The default name of an embedded engine model will have `_ctx.onnx` appended to the end. Users can specify `trt_ep_context_file_path=my_ep_context_model.onnx` to overwrite this default name. +* If an embedded engine is used the library **`nvinfer_builder_resource` of TensorRT is not required**, which is by far the largest library. This enables the case of shipping a minimal set of libraries in the case that a fixed set of models is used which are packaged as precompield engine. +* Besides everything that embedded engines enable to accelerate the load time, they also **enable pacakging an externally compiled engine** using e.g. `trtexec`. A [python script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py) that is capable of packaging such a precompiled engine into an ONNX file is included in the python tools. ## Performance Tuning For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](./../performance/tune-performance/index.md) From 0433cef4b8834d728d2b035cc5f54344f0739b9a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Maximilian=20M=C3=BCller?= Date: Fri, 31 May 2024 11:31:33 +0200 Subject: [PATCH 2/3] typo fix --- docs/execution-providers/TensorRT-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/TensorRT-ExecutionProvider.md b/docs/execution-providers/TensorRT-ExecutionProvider.md index 0c96071b54308..c2be2fb895826 100644 --- a/docs/execution-providers/TensorRT-ExecutionProvider.md +++ b/docs/execution-providers/TensorRT-ExecutionProvider.md @@ -500,7 +500,7 @@ $./onnxruntime_perf_test -e tensorrt -r 1 /model_database/transformer_model/mode * When running the embedded engine model, the default setting is `trt_ep_context_embed_mode=0`, where the engine cache path is embedded and TRT EP will look for the engine cache on the disk. Alternatively, users can set `trt_ep_context_embed_mode=1`, embedding the entire engine binary data as a string in the model. However, this mode increases initialization time due to ORT graph optimization hashing the long string. Therefore, we recommend using `trt_ep_context_embed_mode=0`. * The default name of an embedded engine model will have `_ctx.onnx` appended to the end. Users can specify `trt_ep_context_file_path=my_ep_context_model.onnx` to overwrite this default name. * If an embedded engine is used the library **`nvinfer_builder_resource` of TensorRT is not required**, which is by far the largest library. This enables the case of shipping a minimal set of libraries in the case that a fixed set of models is used which are packaged as precompield engine. -* Besides everything that embedded engines enable to accelerate the load time, they also **enable pacakging an externally compiled engine** using e.g. `trtexec`. A [python script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py) that is capable of packaging such a precompiled engine into an ONNX file is included in the python tools. +* Besides everything that embedded engines enable to accelerate the load time, they also **enable packaging an externally compiled engine** using e.g. `trtexec`. A [python script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py) that is capable of packaging such a precompiled engine into an ONNX file is included in the python tools. ## Performance Tuning For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](./../performance/tune-performance/index.md) From 338fe99f796d3fe261d3c7435e81286baf0d9fbf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Maximilian=20M=C3=BCller?= Date: Mon, 3 Jun 2024 22:10:55 +0200 Subject: [PATCH 3/3] typo --- docs/execution-providers/TensorRT-ExecutionProvider.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/execution-providers/TensorRT-ExecutionProvider.md b/docs/execution-providers/TensorRT-ExecutionProvider.md index c2be2fb895826..6245d64a674ae 100644 --- a/docs/execution-providers/TensorRT-ExecutionProvider.md +++ b/docs/execution-providers/TensorRT-ExecutionProvider.md @@ -491,7 +491,7 @@ Note: The example does not specify `trt_engine_cache_path` because `onnxruntime_ $./onnxruntime_perf_test -e tensorrt -r 1 -i "trt_engine_cache_enable|true trt_dump_ep_context_model|true" /model_database/transformer_model/model.onnx ``` Once the inference is complete, the embedded engine model is saved to disk. User can then run this model just like the original one, but with a significantly quicker session creation time. -```bask +```bash $./onnxruntime_perf_test -e tensorrt -r 1 /model_database/transformer_model/model_ctx.onnx ```