-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is that possible to use trt file on onnxruntime? #17811
Comments
The only supporting model format in ORT is ONNX. If you want to run a TensorRT model, you need to either (1) convert it back to ONNX or (2) use the existing ONNX model directly. If the goal is to compare performance, using existing ONNX model should be enough. As you specified |
You can follow the stable diffusion example to run TensorRT EP. The key part is to construct provider options from input profile to support dynamic shape: The other part is just like the other provider: create a session from onnx file, then run model with IO/Binding and CUDA graph. |
@tianleiwu , do you have a simpler example? The example mentioned is hard to read because
If we don't have such an example, we need to create one asap. PyTorch does very good on this kind of introduction examples. |
@wschin, You are right. The stable diffusion is complex, and use many advanced settings, so it might not be good as tutorial. Simple examples can found in document: https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html |
@MinGiSa, as @tianleiwu mentioned, you need to feed your existing ONNX file to launch onnxruntime with TensorRT and CUDA execution providers. Please don't feed TensorRT files to onnxruntime directly. Note that the conversion pipeline |
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Describe the issue
I have converted an existing ONNX file to a TensorRT file (engine). I would like to perform inference using the converted TensorRT file with ONNX Runtime, but I am unsure of the process. When I attempt inference using the following code:
ortSession = ort.InferenceSession(engine, providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider'])
I encounter the following error:
Traceback (most recent call last):
File "packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 405, in init
raise TypeError(f"Unable to load from type '{type(path_or_bytes)}'")
TypeError: Unable to load from type '<class 'tensorrt.tensorrt.ICudaEngine'>'
It seems there is an issue with deserializing the TensorRT file saved through serialization. How can I resolve this problem?
To reproduce
import tensorrt as trt
import os
import cv2
import time
import numpy as np
from cuda import cuda
import warnings
import onnxruntime as ort
warnings.filterwarnings("ignore")
def preprocessImage(imagePath, imageSize):
img = cv2.imread(imagePath, cv2.IMREAD_COLOR)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
image = cv2.dnn.blobFromImage(img, 1 / 255, imageSize, (0, 0, 0))
def trtInference(engine, context, data):
nInput = np.sum([engine.binding_is_input(i) for i in range(engine.num_bindings)])
nOutput = engine.num_bindings - nInput
# print('nInput:', nInput)
# print('nOutput:', nOutput)
os.environ['CUDA_MODULE_LOADING'] = 'LAZY'
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_engine_path = r"trt engine path here"
imagePath = r'image path here'
imageSize = (580, 410)
batchSize = 1
processedImage = preprocessImage(imagePath, imageSize)
with open(trt_engine_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
trtStart = time.time()
trtOutputs = trtInference(engine, context, processedImage)
trtOutputs = np.array(trtOutputs[1]).reshape(batchSize, -1)
trtEnd = time.time()
print('--tensorrt--')
print(trtOutputs.shape)
print(trtOutputs[0][:10])
print(np.argmax(trtOutputs, axis=1))
print('Time: ', trtEnd - trtStart)
def trtInferenceWithONNXRuntime(ort_session, data):
ort_inputs = {ort_session.get_inputs()[0].name: data}
ort_outputs = ort_session.run(None, ort_inputs)
return ort_outputs
ortSession = ort.InferenceSession(engine, providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider'])
ortStart = time.time()
ortOutputs = trtInferenceWithONNXRuntime(ortSession, processedImage)
ortOutputs = np.array(ortOutputs[0]).reshape(batchSize, -1)
ortEnd = time.time()
print('--onnxruntime--')
print(ortOutputs.shape)
print(ortOutputs[0][:10])
print(np.argmax(ortOutputs, axis=1))
print('Time: ', ortEnd - ortStart)
Urgency
No response
Platform
Windows
OS Version
windows 11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA, TensorRT
Execution Provider Library Version
torch 2.0.1, CUDA 11.7
The text was updated successfully, but these errors were encountered: