-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNXRuntime 1.18 crashing with TensorRT EP when dealing with big inputs #21001
Comments
The error message "TensorRT EP failed to create engine from network" indicates something went wrong when TRT EP is calling Could you increase the trt_max_workspace_size to see? The default is 1 GB. Also, quick question, can you repro the issue using |
Hi,
I tried with trt_max_workspace_size<https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#trt_max_workspace_size> set to 2G, 4G, 8G with the same result getting also this additional warning if it is set greater than 1G
024-06-13 07:26:15.840575541 [W:onnxruntime:CF, tensorrt_execution_provider.cc:1479 TensorrtExecutionProvider] [TensorRT EP] TensorRT option trt_max_workspace_size must be a positive integer value. Set it to 1073741824 (1GB)
Not really familiar with trtexec, I tried by just specifying the onnx model and it failed with
[06/13/2024-17:20:39] [E] [TRT] ModelImporter.cpp:732: ERROR: builtin_op_importers.cpp:4531 In function importSlice:
[8] Assertion failed: (axes.allValuesKnown()) && "This version of TensorRT does not support dynamic axes."
[06/13/2024-17:20:39] [E] Failed to parse onnx file
[06/13/2024-17:20:39] [I] Finish parsing network model
[06/13/2024-17:20:39] [E] Parsing model failed
[06/13/2024-17:20:39] [E] Failed to create engine from model or file.
[06/13/2024-17:20:39] [E] Engine set up failed
I used TensorRT 8.5.3 in this case.
From: Chi Lo ***@***.***>
Sent: Wednesday, June 12, 2024 12:38 PM
To: microsoft/onnxruntime ***@***.***>
Cc: Mathieu Sansregret ***@***.***>; Author ***@***.***>
Subject: Re: [microsoft/onnxruntime] ONNXRuntime 1.18 crashing with TensorRT EP when dealing with big inputs (Issue #21001)
EXTERNAL EMAIL : Do not click any links or open any attachments unless you trust the sender and know the content is safe.
The error message "TensorRT EP failed to create engine from network" indicates something went wrong when TRT EP is calling
TRT's api buildSerializedNetwork() and since it happens when dealing with large image, i'm suspecting it's due to OOM.
Could you increate the trt_max_workspace_size<https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#trt_max_workspace_size> to see? The default is 1 GB.
Also, quick question, can you repro the issue using trtexec?
-
Reply to this email directly, view it on GitHub<#21001 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AG7CMVD4ST7IDP4Y3IDRPMLZHB2N3AVCNFSM6AAAAABJEWIWWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGQ3TKMRZGU>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Hmm that's strange. Could you share the code that set trt_max_workspace_size? As for trtexec, some models are not fully TRT eligible, it seems that's the case of your model, so trtexec won't be able to run them. How about trtexec with TRT 10? |
Found the problem on my side for trt_max_workspace_size, re-validated with 2G, 4G and 8G
Still getting
2024-06-14 11:34:30.389829469 [W:onnxruntime:CF, tensorrt_execution_provider.h:84 log] [2024-06-14 15:34:30 WARNING] Skipping tactic 0x0000000000000000 due to exception autotuning: CUDA error 2 allocating 6370102777-byte buffer: out of memory
2024-06-14 11:34:30.480769226 [E:onnxruntime:CF, tensorrt_execution_provider.h:82 log] [2024-06-14 15:34:30 ERROR] 4: [optimizer.cpp::computeCosts::3726] Error Code 4: Internal Error (Could not find any implementation for node {ForeignNode[onnx::Cast_507[Constant]...Concat_372]} due to insufficient workspace. See verbose log for requested sizes.)
2024-06-14 11:34:30.520078719 [E:onnxruntime:CF, tensorrt_execution_provider.h:82 log] [2024-06-14 15:34:30 ERROR] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
2024-06-14 11:34:30.520215247 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_16074816800397161377_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_16074816800397161377_0_0' Status Message: TensorRT EP failed to create engine from network.
We already sent the model (FLMFRIFE_Untrained.onnx) to a member of the ONNXRuntime team : Scott McKay.
From: Chi Lo ***@***.***>
Sent: Thursday, June 13, 2024 7:48 PM
To: microsoft/onnxruntime ***@***.***>
Cc: Mathieu Sansregret ***@***.***>; Author ***@***.***>
Subject: Re: [microsoft/onnxruntime] ONNXRuntime 1.18 crashing with TensorRT EP when dealing with big inputs (Issue #21001)
EXTERNAL EMAIL : Do not click any links or open any attachments unless you trust the sender and know the content is safe.
I tried with trt_max_workspace_sizehttps://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#trt_max_workspace_size> set to 2G, 4G, 8G with the same result getting also this additional warning if it >is set greater than 1G
Hmm that's strange. Could you share the code that set trt_max_workspace_size?
Please see the example code here<https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#click-below-for-c-api-example>.
As for trtexec, some models are not fully TRT eligible, so trtexec won't be able to run them.
Could you share the proxy model so that we can repro from our side? Or could you point to public model that can repro the issue.
-
Reply to this email directly, view it on GitHub<#21001 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AG7CMVGFQECYT4KO6ZR6OM3ZHIVTHAVCNFSM6AAAAABJEWIWWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWHE3DQMRQGQ>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
we'll sync with @skottmckay to get the model |
what is trt_max_workspace_size ? |
With this code
static const size_t nbGig = getenv("ENV_TRT_WRKS_SZ") ? atoi(getenv("ENV_TRT_WRKS_SZ")) : 1;
trto.trt_max_workspace_size = nbGig * 1073741824
I tried with ENV_TRT_WRKS_SZ = 1,2,4 and 8 with the same result.
From: geraldstanje ***@***.***>
Sent: Thursday, June 20, 2024 1:16 PM
To: microsoft/onnxruntime ***@***.***>
Cc: Mathieu Sansregret ***@***.***>; Author ***@***.***>
Subject: Re: [microsoft/onnxruntime] ONNXRuntime 1.18 crashing with TensorRT EP when dealing with big inputs (Issue #21001)
EXTERNAL EMAIL : Do not click any links or open any attachments unless you trust the sender and know the content is safe.
what is trt_max_workspace_size ?
-
Reply to this email directly, view it on GitHub<#21001 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AG7CMVH54DGCUAHOMIVWZELZIME4JAVCNFSM6AAAAABJEWIWWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGE3TGMBZHA>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
The value of trt_max_workspace_size will determine memory size limit of the memory pool.
The error message showed that "insufficient workspace". It seems 8G is not enough. BTW, you can also monitor the GPU memory usage while running the inference to see memory consumption. |
i did get the model from Scott, but i encountered different issue which seems related to a Concat's axis attribute. trtexec 10.0.1 and 8.6 -> This version of TensorRT does not support dynamic axes Will check with Scott, or could you share the model again to make sure i'm using the same model as you? |
@chilo-ms trt_max_workspace_size depends on the gpu memory? The Nvidia T4 has 16 GB GDDR6 memory - so i can set 16 GB for trt_max_workspace_size? |
yes, give it a try. |
Update here. I saw similar OOM message when the workspace size is 2G when running input with 2 4K (1x6x3840x2176) |
closing this since @chilo-ms provided last update on increasing workspace size. |
Describe the issue
Testing ONNXRuntime 1.18 with TensorRT EP either 10.0.1 or 8.5.3
Using directly the onnxruntime-linux-x64-gpu-1.18.0.tgz for the TensorRT 10.0.1 tests and recompiled OnnxRuntime 1.18 with TensorRT 8.5.3 for the TensorRT 8.5.3 tests.
With TensorRT 10.0.1 our model is crashing when dealing with 2 input images of 4K UHDTV (3840x2167)
with this error in the shell
Error [Non-zero status code returned while running TRTKernel_graph_torch_jit_5378504288688145163_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_5378504288688145163_0_0' Status Message: TensorRT EP failed to create engine from network.]
and this callstack
#5 0x00007fc7f0c30cf0 in () at /lib64/libpthread.so.0
#6 0x00007fbe6b9d8102 in onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(onnxruntime::GraphViewer const&, onnxruntime::Node const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, unsigned long, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, unsigned long> > >&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, unsigned long, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, unsigned long> > >&, std::vector<onnxruntime::NodeComputeInfo, std::allocatoronnxruntime::NodeComputeInfo >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}::operator()(void*, OrtApi const*, OrtKernelContext*) const [clone .isra.2141] ()
at PATH/libonnxruntime_providers_tensorrt.so
#7 0x00007fbe6b9dae50 in std::_Function_handler<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*), onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(onnxruntime::GraphViewer const&, onnxruntime::Node const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, unsigned long, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, unsigned long> > >&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, unsigned long, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, unsigned long> > >&, std::vector<onnxruntime::NodeComputeInfo, std::allocatoronnxruntime::NodeComputeInfo >&)::{lambda(void*, OrtApi const*, OrtKernelContext*)#3}>::_M_invoke(std::_Any_data const&, void*&&, OrtApi const*&&, OrtKernelContext*&&) ()
at PATH/libonnxruntime_providers_tensorrt.so
#8 0x00007fc7cd2923c1 in onnxruntime::FunctionKernel::Compute(onnxruntime::OpKernelContext*) const ()
at PATH/libonnxruntime.so.1.18.0
#9 0x00007fc7cd33272f in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) () at PATH/libonnxruntime.so.1.18.0
#10 0x00007fc7cd32a5ef in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) () at PATH/libonnxruntime.so.1.18.0
#11 0x00007fc7cd335723 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) () at PATH/libonnxruntime.so.1.18.0
#12 0x00007fc7cd3308d1 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) ()
at PATH/libonnxruntime.so.1.18.0
#13 0x00007fc7cd303ccf in onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManage------T--T----T--Ty--T----Typ--Typ----Typ--Typ----Ty------T----T--T------T--------Type for more, q to quit, c to continue without paging--
r const&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection*, bool, onnxruntime::Stream*) ()
at PATH/libonnxruntime.so.1.18.0
#14 0x00007fc7cd30659c in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) ()
at PATH/libonnxruntime.so.1.18.0
#15 0x00007fc7cd30696a in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) () at PATH/libonnxruntime.so.1.18.0
#16 0x00007fc7ccb5500a in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator >, std::vector<OrtDevice, std::allocator > const) [clone .localalias.2030] () at PATH/libonnxruntime.so.1.18.0
#17 0x00007fc7ccb558e0 in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue const* const, 18446744073709551615ul>, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue*, 18446744073709551615ul>) () at PATH/libonnxruntime.so.1.18.0
#18 0x00007fc7ccae253c in OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) ()
at PATH/libonnxruntime.so.1.18.0
If running the same model with ONNXRuntime 1.18 and TensorRT 8.5.3 it is fine with these inputs (3849x2167), still working with 6K (6531x3100) and it is crashing with 8K (7680x4320)
If running with TensorRT 10.0.1 on a machine with lower compute capability ( for example nvidia-smi --query-gpu=compute_cap --format=csv that returns 6.1 ) ONNXRuntime will crash with the same error/callstack with 2 HD images (1920x1080)
So here are the observations:
1- ONNXRuntime should not crash in all cases, it should return an error.
2- In our case going to TensorRT 10 is not an option as it crashes on older machines and it is unable to deal with the same image size than tensorRT 8.5.3
To reproduce
Use a model that takes big images as input in the TensorRT EP will make the software crash.
Urgency
No response
Platform
Linux
OS Version
Rocky Linux 8.7/9.3
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18
ONNX Runtime API
C++
Architecture
X86
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8 TensorRT 10.0.1 or 8.5.3
The text was updated successfully, but these errors were encountered: