-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TensorRT EP] How can I disable generating cache when using trt execution provider #22822
Comments
Hi @noahzn Your old engine/profile might not be reused by TRTEP if current inference param/cache name/env variables/HW env changes. Here's more info about engine reusability: https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#trt_engine_cache_enable I wonder if you update your old engine/profile with newly generated ones, is that new engine going to be reused? or a newer engine need to be generated |
@yf711 Thanks for your reply! |
It's not related to dimension ranges in the intermediate layers input. The engine cache name,
Also, the cache name contains compute capability, e.g. sm80. Does any of metadata above change between the run that generated the cache and the run that supposed to use the old cache? |
@chilo-ms Thanks for your reply. I don't think the above metadata changes. The model's file name is never changed, the input names of the graph are fixed in the onnx model. Concerning the graph, since the numbers of keypoints are different, it may be changed. So now I try to set min. and max. shape of some middle layers and seems now it generates new caches less frequently than before. For example, these are the cached files in the folder.
|
I assume the shape of input/output tensor reflects the numbers of keypoints, right? I suspected it's the model's file name. Could you confirm you use the exact same path for the first run and test run? For setting the trt_profile_min_shapes, trt_profile_max_shapes and trt_profile_opt_shapes to the range of the minimum and maximum of the input image, it doesn't help for the issue, but it can prevent TRT engine being rebuilt during multiple inference run with different input images.
Yes, in your case, the model is being partitioned into multiple subgraphs that run by TRT EP and several other nodes run by CUDA EP or CPU. |
@chilo-ms Thanks for your reply!
The input images always have the same size (e.g., 512x512), but different numbers of keypoints might be extracted. So, intermediate tensors can have different shapes.
I didn't change this part in the code. The onnx model always has the same name. Now I set min. max. shapes and it seems that trt doesn't generate new caches as frequent as before.
As you can see, some new files are generated today. When it generates new files, are the old files still used? Only two new files are generated today, and more than 50 files were generated on Nov. 22. (I only paste part ) |
I think there are two topics here:
For the 1st topic, i saw you mentioned:
What graph opt level did you set for the first/warm-up run? For the 2nd topic, it's possible the range of min. max. opt. values are not large enough to cover all the inputs shapes or intermediate tensor shapes. If you turn on verbose log,
once the engine files are created and you keep running multiple inference runs, you might see the following log for the last subgraph: |
No, the old files won't be used. BTW, the |
During warm-up I use
Yes, I think so. In our last two tests the caches were not updated. |
yes. Then the hash value should be the same between warm-up run and test run, then you won't see new engine cache being created. Could you help give it a try? |
Thank you! I will let you know. @chilo-ms |
I have already generated some trt cache when infering my ONNX model using TRT Execution Provider. Then, for the online testing of my model, I set
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
, but it seems that still new caches are generated. I only want to reuse the old cache while not generating new cache. How can I do that? Thanks in advance!The text was updated successfully, but these errors were encountered: