-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null Tensor as output when using TensorRT without session io bindings #18509
Comments
@chilo-ms , can you assist? |
You don't have to use experimental version onnxruntime, using the apis from onnxruntime_cxx_api.h should work. https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#samples |
Hi, I was wondering if there is any follow up on this issue? |
sorry for the delay. I didn't spot anything wrong with your options. would it be possible for you to provide a small test case (model + code) which can reproduce the null behavior? I will ask @yf711 to investigate it. |
Unfortunately it is not super easy for me to provide a small test case, but I will look into it. In the mean time I also found that TensorRT does actually work for the first batch, but all subsequent batches do not work. I only figured this out now, because the first batch is always nonsense data given by a warmup function, so I never checked the output there. |
Sorry for the delay on the small test case, there are some issues with yolov8 models and licensing. But, I did some more testing myself and I found that I do not run into the issue if I turn trt_cuda_graph_enable off. Does that make sense, or does that mean that there is a bug in trt_cuda_graph_enable? I also found that this fixed my issue with running multiple parallel threads that call model.run on the same session. |
ah, that's a key piece of information. there are some known bugs in 1.16.1, |
Good to know that there might be a problem in 1.16.1, that is indeed the version I am using. I am not in a huge rush, so I will wait for the 1.17 release. |
Hello, it has been a while, but I am working with TensorRT again and I wanted to say that I am experiencing the same issue even with onnxruntime 1.17. It is also still the case then when I turn off trt_cuda_graph_enable it does work. So I was wondering whether you can mabye look into this issue again. And how problematic is it to turn trt_cuda_graph_enable off? Will that significantly impact the performance? |
Hi, Please see the constrains of using cuda graph in the doc: Also, if you want to use multithreading, please make sure only one thread initializes the ORT session instance, and then have multiple parallel threads that call model.run on the same session. (Note: ORT should warn the users if they use cuda graph without io-bindings.) |
It depends. The major advantage of CUDA graph is that it decreases kernel launch time, especially the model has several CUDA kernels. I suggest you can run the perf test against your model with cuda graph enabled and disabled to see the result. |
Alright thanks for the quick and clear response! I will investigate whether it affects the performance. |
After looking closer to ORT's cuda graph replay code in InferenceSession.Run() and the cuda graph doc, i might be wrong about my previously multi-threading statement. Even though, InferenceSession.Run() is thread-safe, but per doc, "cuda graph objects are not internally synchronized and must not be accessed concurrently from multiple threads", meaning if multiple threads are calling Run() on same inference session, they are accessing the same cuda graph object concurrently which is not suggested. Therefore, we suggest not to use multithreading for cuda graph with ORT TRT. |
Describe the issue
When I use the TensorRT execution provider and I use session.run like this:
'''
std::vectorOrt::Value lOutputTensors = mOrtSession->Run(Ort::RunOptions{ nullptr }, mInputNodeNamePointers.data(), lInputTensors.data(),
mInputNodeNamePointers.size(), mOutputNodeNamePointers.data(), mOutputNodeNamePointers.size());
'''
Then I get errors when calling lOutputTensors[0].GetTensorMutableData() saying that the output is null. This does not happen when I use the cuda execution provider. Also when I use session io bindings with TensorRT then I have no issues. But there are some reasons that lead to me not wanting to use session io bindings, so I would like it to work without them. I did found someone online who seemingly got it to work without session io bindings, but I saw that he was using an experimental version of onnxruntime:
#14614
So I guess a follow up question is, if it is necessary how do I build the experimental version of onnxruntime?
To reproduce
Using TensorRT as your execution provider and then calling session.run() like this:
'''
std::vectorOrt::Value lOutputTensors = mOrtSession->Run(Ort::RunOptions{ nullptr }, mInputNodeNamePointers.data(), lInputTensors.data(),
mInputNodeNamePointers.size(), mOutputNodeNamePointers.data(), mOutputNodeNamePointers.size());
'''
Followed by lOutputTensors[0].GetTensorMutableData()
Urgency
No response
Platform
Windows
OS Version
10.0.19045
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.16.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
8.6.1.6
The text was updated successfully, but these errors were encountered: