-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a perfermance issue when use onnx runtime-tensorrt #19934
Comments
there are various strategies for reducing the session initialization time. we're in the process of putting together a doc to provide guidance. |
Hi @jywu-msft |
there are 2 areas which cost the most time during tensorrt EP initialization.
|
Hi @jywu-msft I see 2) So why the onnx-trt not check if enable the trt_engine_cache_enable, if it does, do not load the IBuilder? |
And about 1) I think it is indeed not easy. |
ORT TRT has this similar feature (starts from 1.17.0) which skips TRT builder instantiation and simply deserializes engine cache to run inference. However, we still need an "ONNX" model to start with. So, ORT TRT helps user create the "embed engine" model which is basically an ONNX model contains only one node that wraps the engine cache. Please see below the highlighted part to know how to use ORT TRT provider options to generate/run embed engine model. BTW, we are working on documenting the usage of embed engine model.
|
Hi @chilo-ms, I try to use the trt_dumo_ep_context_model like following: But I got error: |
And I try to modify the source code simiply, I comments the filds about IBuilder, INetworkDefinition, IParser. I found it could still work. This is a simply version, I know. I will continue to debug if this way will cause something errors, also, I want to know if I have a tensorrt model in trt_engine_cache_path, and enable the trt_engine_cache_enable, I do not initialize IBuilder, is this way correct? |
I think if I comment those fields about IBuilder, INetworkDefinition, IParser, so that the outside could not get the associated object, it also could prove that the outside does not use those objects, right? |
What ORT version are you using? |
Your idea is basically right. In additions to the code path (in EP Compile) you found that it involves builder instantiation, there is also builder instantization in the EP GetCapability. So that's why we need the "Embed Engine" model to skip builder instantization. |
Hi @chilo-ms Thanks for your reply very much! I will try to remove the process of generating the IBuilder if it already genearta model. And about the EP GetCapabnility, I also have a question, and here is the link: "So that's why we need the "Embed Engine" model to skip builder instantization."I do not know why the EP GetCapability method need to genearte IBuilder Object, as my knowledage, the IBuilder is used to generate some trt objects, such as the INetworkDefinition. And if I already have a trt model from onnx, could I skip this step in process? |
Yes my version is 1.16.3. Because at first, I download your 1.17.0 or 1.17.3 packages, there is no dll in it. Why the newest packages in nuget don't have dll? Also I will use the newest code to build the dll. |
use the 1.17.1 nuget package. |
Hi @jywu-msft I try to use the 1.17.1 Microsoft.ML.Onnxruntime.Gpu depends on Microsoft.ML.OnnxRuntime.Gpu.Windows. I check the structure of 1.17.1 package, I found that the directory was "buildTransitive" not "build", it cause that the vs could not load the props,targets files. I feel confused, am I missing something? |
Because TRT parser needs TRT networks which depends on TRT builder. If you have TRT engine cache, you still need the embed engine model to skip the process for now. Also, I'm working on the document for users to better understand this feature. |
Describe the issue
I'm using the onnx runtime-tensorrt
I found every time when I load the onnx model, it will cost some time, it may be a little short or long.
So I print the log.
I'd like to know what the red standard areas spend their time doing.
To reproduce
just use the onnx runtime-tensorrt to run a onnx model.
Urgency
No response
Platform
Windows
OS Version
WIN10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.6.3
ONNX Runtime API
C++
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.6 Tensorrt 8.6
The text was updated successfully, but these errors were encountered: