Null Tensor as output when using TensorRT without session io bindings #18509

cozeybozey · 2023-11-20T08:33:28Z

Describe the issue

When I use the TensorRT execution provider and I use session.run like this:
'''
std::vectorOrt::Value lOutputTensors = mOrtSession->Run(Ort::RunOptions{ nullptr }, mInputNodeNamePointers.data(), lInputTensors.data(),
mInputNodeNamePointers.size(), mOutputNodeNamePointers.data(), mOutputNodeNamePointers.size());
'''
Then I get errors when calling lOutputTensors[0].GetTensorMutableData() saying that the output is null. This does not happen when I use the cuda execution provider. Also when I use session io bindings with TensorRT then I have no issues. But there are some reasons that lead to me not wanting to use session io bindings, so I would like it to work without them. I did found someone online who seemingly got it to work without session io bindings, but I saw that he was using an experimental version of onnxruntime:
#14614

So I guess a follow up question is, if it is necessary how do I build the experimental version of onnxruntime?

To reproduce

Using TensorRT as your execution provider and then calling session.run() like this:
'''
std::vectorOrt::Value lOutputTensors = mOrtSession->Run(Ort::RunOptions{ nullptr }, mInputNodeNamePointers.data(), lInputTensors.data(),
mInputNodeNamePointers.size(), mOutputNodeNamePointers.data(), mOutputNodeNamePointers.size());
'''
Followed by lOutputTensors[0].GetTensorMutableData()

Urgency

No response

Platform

Windows

OS Version

10.0.19045

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

8.6.1.6

jywu-msft · 2023-11-27T17:47:34Z

@chilo-ms , can you assist?

chilo-ms · 2023-11-29T18:21:19Z

You don't have to use experimental version onnxruntime, using the apis from onnxruntime_cxx_api.h should work.
I can't repro from my side, so could you share your code of creating the OrtTensorRTProviderOptionsV2 and calling the SessionOptionsAppendExecutionProvider_TensorRT_V2?
More lines of code might help us investigate.

https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#samples

cozeybozey · 2023-11-30T12:56:22Z

This is where I am creating and appending the options:

cozeybozey · 2023-12-15T10:09:37Z

Hi, I was wondering if there is any follow up on this issue?

jywu-msft · 2023-12-18T17:24:44Z

sorry for the delay. I didn't spot anything wrong with your options. would it be possible for you to provide a small test case (model + code) which can reproduce the null behavior? I will ask @yf711 to investigate it.

cozeybozey · 2023-12-21T15:07:39Z

Unfortunately it is not super easy for me to provide a small test case, but I will look into it. In the mean time I also found that TensorRT does actually work for the first batch, but all subsequent batches do not work. I only figured this out now, because the first batch is always nonsense data given by a warmup function, so I never checked the output there.

cozeybozey · 2024-01-08T09:33:50Z

Sorry for the delay on the small test case, there are some issues with yolov8 models and licensing. But, I did some more testing myself and I found that I do not run into the issue if I turn trt_cuda_graph_enable off. Does that make sense, or does that mean that there is a bug in trt_cuda_graph_enable? I also found that this fixed my issue with running multiple parallel threads that call model.run on the same session.

jywu-msft · 2024-01-08T16:51:56Z

Sorry for the delay on the small test case, there are some issues with yolov8 models and licensing. But, I did some more testing myself and I found that I do not run into the issue if I turn trt_cuda_graph_enable off. Does that make sense, or does that mean that there is a bug in trt_cuda_graph_enable? I also found that this fixed my issue with running multiple parallel threads that call model.run on the same session.

ah, that's a key piece of information. there are some known bugs in 1.16.1,
would it be possible for you to test with the latest main (build from source) (or wait until 1.17 is released later this month)

cozeybozey · 2024-01-09T15:17:46Z

Good to know that there might be a problem in 1.16.1, that is indeed the version I am using. I am not in a huge rush, so I will wait for the 1.17 release.

cozeybozey · 2024-07-29T12:43:32Z

Hello, it has been a while, but I am working with TensorRT again and I wanted to say that I am experiencing the same issue even with onnxruntime 1.17. It is also still the case then when I turn off trt_cuda_graph_enable it does work. So I was wondering whether you can mabye look into this issue again. And how problematic is it to turn trt_cuda_graph_enable off? Will that significantly impact the performance?

chilo-ms · 2024-07-29T16:39:50Z

Hi,
The null tensor as output you saw using TRT EP with trt_cuda_graph_enable enabled, is due to you didn't use io-bindings.

Please see the constrains of using cuda graph in the doc:
''''
By design, CUDA Graphs is designed to read from/write to the same CUDA virtual memory addresses during the graph replaying step as it does during the graph capturing step. Due to this requirement, usage of this feature requires using IOBinding so as to bind memory which will be used as input(s)/output(s) for the CUDA Graph machinery to read from/write to
''''

Also, if you want to use multithreading, please make sure only one thread initializes the ORT session instance, and then have multiple parallel threads that call model.run on the same session.

(Note: ORT should warn the users if they use cuda graph without io-bindings.)

chilo-ms · 2024-07-29T16:44:01Z

And how problematic is it to turn trt_cuda_graph_enable off? Will that significantly impact the performance?

It depends. The major advantage of CUDA graph is that it decreases kernel launch time, especially the model has several CUDA kernels.

I suggest you can run the perf test against your model with cuda graph enabled and disabled to see the result.

cozeybozey · 2024-07-30T06:04:28Z

Alright thanks for the quick and clear response! I will investigate whether it affects the performance.

cozeybozey · 2024-07-30T13:19:40Z

Also, if you want to use multithreading, please make sure only one thread initializes the ORT session instance, and then have multiple parallel threads that call model.run on the same session.

Am I misunderstanding or do the docs say that you cannot use multithreading at all?

I think multithreading in general will be difficult if you have to use the exact same memory location for the input data, since that will lead to multiple threads overwriting each others data.

chilo-ms · 2024-07-30T18:25:23Z

Also, if you want to use multithreading, please make sure only one thread initializes the ORT session instance, and then have multiple parallel threads that call model.run on the same session.

After looking closer to ORT's cuda graph replay code in InferenceSession.Run() and the cuda graph doc, i might be wrong about my previously multi-threading statement.

Even though, InferenceSession.Run() is thread-safe, but per doc, "cuda graph objects are not internally synchronized and must not be accessed concurrently from multiple threads", meaning if multiple threads are calling Run() on same inference session, they are accessing the same cuda graph object concurrently which is not suggested.

Therefore, we suggest not to use multithreading for cuda graph with ORT TRT.

github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform labels Nov 20, 2023

jywu-msft assigned chilo-ms Nov 27, 2023

cozeybozey mentioned this issue Jan 4, 2024

Can I upload an exported onnx model from yolov8 to a public github issue from ONNX Runtime? ultralytics/ultralytics#7305

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Null Tensor as output when using TensorRT without session io bindings #18509

Null Tensor as output when using TensorRT without session io bindings #18509

cozeybozey commented Nov 20, 2023

jywu-msft commented Nov 27, 2023

chilo-ms commented Nov 29, 2023 •

edited

Loading

cozeybozey commented Nov 30, 2023

cozeybozey commented Dec 15, 2023

jywu-msft commented Dec 18, 2023

cozeybozey commented Dec 21, 2023

cozeybozey commented Jan 8, 2024

jywu-msft commented Jan 8, 2024

cozeybozey commented Jan 9, 2024

cozeybozey commented Jul 29, 2024

chilo-ms commented Jul 29, 2024 •

edited

Loading

chilo-ms commented Jul 29, 2024

cozeybozey commented Jul 30, 2024

cozeybozey commented Jul 30, 2024

chilo-ms commented Jul 30, 2024 •

edited

Loading

Null Tensor as output when using TensorRT without session io bindings #18509

Null Tensor as output when using TensorRT without session io bindings #18509

Comments

cozeybozey commented Nov 20, 2023

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

jywu-msft commented Nov 27, 2023

chilo-ms commented Nov 29, 2023 • edited Loading

cozeybozey commented Nov 30, 2023

cozeybozey commented Dec 15, 2023

jywu-msft commented Dec 18, 2023

cozeybozey commented Dec 21, 2023

cozeybozey commented Jan 8, 2024

jywu-msft commented Jan 8, 2024

cozeybozey commented Jan 9, 2024

cozeybozey commented Jul 29, 2024

chilo-ms commented Jul 29, 2024 • edited Loading

chilo-ms commented Jul 29, 2024

cozeybozey commented Jul 30, 2024

cozeybozey commented Jul 30, 2024

chilo-ms commented Jul 30, 2024 • edited Loading

chilo-ms commented Nov 29, 2023 •

edited

Loading

chilo-ms commented Jul 29, 2024 •

edited

Loading

chilo-ms commented Jul 30, 2024 •

edited

Loading