Crash in TensorrtExecutionProvider when TensorRT EP fails to create engine from network #21567

frenetj · 2024-07-30T21:36:31Z

Describe the issue

When the TensorRT EP fails to create engine from network and the client calls run() again in the same session, the following crash occurs:

#0 0x00007efc5442df84 in nvinfer1::ICudaEngine::getNbIOTensors() const (this=0x0) at tensort/include/NvInferRuntime.h:2160 #1 0x00007efc54451cf8 in onnxruntime::TensorrtExecutionProvider::<lambda(onnxruntime::FunctionState, const OrtApi*, OrtKernelContext*)>::operator()(onnxruntime::FunctionState, const OrtApi *, OrtKernelContext *) const (__closure=0x7efbfb1d8098, state=0x7efbfc81bf80, api= 0x7f02b6d0b2e0 <ort_api_1_to_18>, context=0x7fff94d9ce50) at onnxruntime-1.18.0/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3395 #2 0x00007efc54487e8c in std::_Function_handler<onnxruntime::common::Status(void*, const OrtApi*, OrtKernelContext*), onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const onnxruntime::GraphViewer&, const onnxruntime::Node&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::vector<onnxruntime::NodeComputeInfo>&)::<lambda(onnxruntime::FunctionState, const OrtApi*, OrtKernelContext*)> >::_M_invoke(const std::_Any_data &, void *&&, const OrtApi *&&, OrtKernelContext *&&) (__functor=..., __args#0=@0x7fff94d9cbb8: 0x7efbfc81bf80, __args#1=@0x7fff94d9cbb0: 0x7f02b6d0b2e0 <ort_api_1_to_18>, __args#2=@0x7fff94d9cba8: 0x7fff94d9ce50) at /usr/include/c++/8/bits/std_function.h:283 #3 0x00007f02b59addac in std::function<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*)>::operator()(void*, OrtApi const*, OrtKernelContext*) const (this=0x7efbfb1d8098, __args#0=0x7efbfc81bf80, __args#1=0x7f02b6d0b2e0 <ort_api_1_to_18>, __args#2=0x7fff94d9ce50) at /usr/include/c++/8/bits/std_function.h:687 #4 0x00007f02b59a76b9 in onnxruntime::FunctionKernel::Compute(onnxruntime::OpKernelContext*) const (this=0x7efc014e2c00, context=0x7fff94d9ce50) at onnxruntime-1.18.0/onnxruntime/core/framework/func_kernel.h:52 #5 0x00007f02b5ac7d5c in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (ctx=..., idx=4937, stream_idx=0, terminate_flag=@0x2716f308: false, session_scope=...) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:495 #6 0x00007f02b5abef4c in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (this=0x3587a8e0, ctx=..., stream_idx=0, session_scope=..., terminate_flag=@0x2716f308: false, continue_flag=@0x7fff94d9d51f: true) at onnxruntime-1.18.0/onnxruntime/core/framework/execution_steps.cc:73 #7 0x00007f02b5acb5a3 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (stream_idx=0, ctx=..., session_scope=..., terminate_flag=@0x2716f308: false, since=0) at onnxruntime-1.18.0/onnxruntime/core/framework/stream_execution_context.cc:222 #8 0x00007f02b5ac827b in onnxruntime::<lambda()>::operator()(void) const (__closure=0x7efc017dc3b0) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:589 #9 0x00007f02b5ac992f in std::_Function_handler<void(), onnxruntime::ExecuteThePlan(const onnxruntime::SessionState&, gsl::span<int const>, gsl::span<const OrtValue>, gsl::span<int const>, std::vector<OrtValue>&, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)> >&, const onnxruntime::logging::Logger&, const onnxruntime::DeviceStreamCollection*, bool const&, bool, bool)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #10 0x00007f02b4e39dac in std::function<void ()>::operator()() const (this=0x7fff94d9dbf0) at /usr/include/c++/8/bits/std_function.h:687 #11 0x00007f02b4e1ad49 in onnxruntime::concurrency::ThreadPool::Schedule(onnxruntime::concurrency::ThreadPool*, std::function<void ()>) (tp=0x0, fn=...) at onnxruntime-1.18.0/include/onnxruntime/core/platform/threadpool.h:233 #12 0x00007f02b5ac8608 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) (session_state=..., feed_mlvalue_idxs=..., feeds=..., fetch_mlvalue_idxs=..., fetches=std::vector of length 2, capacity 2 = {...}, fetch_allocators=std::unordered_map with 0 elements, logger=..., device_streams=0x1dbb3080, terminate_flag=@0x2716f308: false, only_execute_path_to_fetches=false, single_thread_mode=true) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:588 #13 0x00007f02b5a68157 in onnxruntime::utils::ExecuteGraphImpl(const onnxruntime::SessionState &, const onnxruntime::FeedsFetchesManager &, gsl::span<OrtValue const, 18446744073709551615>, std::vector<OrtValue, std::allocator<OrtValue> > &, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)>, std::hash<long unsigned int>, std::equal_to<long unsigned int>, std::allocator<std::pair<long unsigned int const, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)> > > > &, ExecutionMode, const bool &, const onnxruntime::logging::Logger &, onnxruntime::DeviceStreamCollection *, bool, onnxruntime::Stream *) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, fetch_allocators=std::unordered_map with 0 elements, execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x2716f308: false, logger=..., device_stream_collection=0x1dbb3080, only_execute_path_to_fetches=false, parent_stream=0x0) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:706 #14 0x00007f02b5a6878e in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x2716f308: false, logger=..., device_stream_collection_holder=..., only_execute_path_to_fetches=false, parent_stream=0x0) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:755 #15 0x00007f02b5a68868 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, execution_mode=ORT_SEQUENTIAL, run_options=..., device_stream_collection_holder=..., logger=...) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:782 #16 0x00007f02b4e33fd5 in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >*, std::vector<OrtDevice, std::allocator<OrtDevice> > const*) (this=0x23f71cf0, run_options=..., feed_names=..., feeds=..., output_names=..., p_fetches=0x7fff94d9f1f0, p_fetches_device_info=0x0) at onnxruntime-1.18.0/onnxruntime/core/session/inference_session.cc:2531 #17 0x00007f02b4e351bc in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue const* const, 18446744073709551615ul>, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue*, 18446744073709551615ul>) (this=0x23f71cf0, run_options=..., feed_names=..., feeds=..., fetch_names=..., fetches=...) at onnxruntime-1.18.0/onnxruntime/core/session/inference_session.cc:2659 #18 0x00007f02b4d42116 in OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) (sess=0x23f71cf0, run_options=0x2716f2e0, input_names=0x1b75aff0, input=0x7efc5550bba0, input_len=2, output_names=0x1dea9570, output_names_len=2, output=0x7efbf802c200) at onnxruntime-1.18.0/onnxruntime/core/session/onnxruntime_c_api.cc:831

To reproduce

Run inference on a model that is too large to be cached (or force return of the following error "TensorRT EP failed to create engine from network." in the TensorRT EP.
Try running the inference again on the same session.
--> crash

Urgency

No response

Platform

Linux

OS Version

ROCKY 8.5 (gcc-11.2.1, c++17)

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.8

The text was updated successfully, but these errors were encountered:

yf711 · 2024-07-31T00:03:00Z

Could you try building ORT from this branch and see if this could stop from crashing?

frenetj · 2024-08-01T17:40:40Z

Hi Yifan,

Thanks for the quick fix; it works perfectly!

However, while compiling your branch with TensorRT 8.5.3, we got the following errors:

/git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc: In member function 'onnxruntime::common::Status onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const onnxruntime::GraphViewer&, const onnxruntime::Node&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::vector<onnxruntime::NodeComputeInfo>&)': /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3055:17: error: 'class nvinfer1::IBuilderConfig' has no member named 'setHardwareCompatibilityLevel' 3055 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3055:57: error: 'nvinfer1::HardwareCompatibilityLevel' has not been declared 3055 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc: In lambda function: /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3644:21: error: 'class nvinfer1::IBuilderConfig' has no member named 'setHardwareCompatibilityLevel' 3644 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3644:61: error: 'nvinfer1::HardwareCompatibilityLevel' has not been declared 3644 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ gmake[2]: *** [CMakeFiles/onnxruntime_providers_tensorrt.dir/build.make:146: CMakeFiles/onnxruntime_providers_tensorrt.dir/git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:2267: CMakeFiles/onnxruntime_providers_tensorrt.dir/all] Error 2

that we fixed by adding #if NV_TENSORRT_MAJOR >= 10 when trt_config->setHardwareCompatibilityLevel was called:

diff --git a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
index 2df4611743..b1e7147ea1 100644
--- a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
+++ b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
@@ -3051,12 +3051,13 @@ Status TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const GraphView

std::string cache_hw_compat = "sm" + compute_capability;
// Enable hardware compatility mode if assigned
+#if NV_TENSORRT_MAJOR >= 10
if (engine_cache_enable_ && engine_hw_compatible_) {
trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS);
cache_hw_compat = "_sm80+";
LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] Hardware compatibility is enabled when loading and capturing engine cache.";
}

+#endif
// Name the engine cache based on GPU compute capacity and reduce the chance of loading an incompatible cache
// Note: Engine cache generated on a GPU with large memory might not be loadable on a GPU with smaller memory, even if they share the same compute capacity
const std::string cache_path_prefix = cache_path + cache_hw_compat;
@@ -3639,12 +3640,13 @@ Status TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const GraphView
}
}

+#if NV_TENSORRT_MAJOR >= 10
// Enable hardware compatility mode if assigned
if (trt_state->engine_hw_compatible) {
trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS);
LOGS_DEFAULT(INFO) << "[TensorRT EP] Re-generate engine with hardware compatibility enabled.";
}

+#endif
// Build engine
std::unique_ptrnvinfer1::IHostMemory serialized_engine;
{

Would it be possible for you to also make this change?

frenetj · 2024-08-01T17:45:55Z

Note that Git's formatting is not showing the second part of the above comment properly. Please read it in standard text format.

yf711 · 2024-08-02T17:28:55Z

Hi @frenetj ORT starts to support TRT8.6 since 1.15 and add features incompatible to older TRT 8.x.
Please find TRT version requirement https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html
We recommend using latest TRT10.x, as ORT will gradually stop supporting TRT8.6 in future

frenetj · 2024-08-08T20:34:26Z

Hello @yf711 Using TRT8.6 works perfectly with this fix. Thanks a lot!

…which was failed to generate trt_engine previously (#21621) ### Description  Add null_ptr check to avoid crash when running session which was failed to generate trt_engine previously ### Motivation and Context  Reported and verified by #21567

frenetj · 2024-10-24T14:31:23Z

Hello @yf711, the fix doesn't seem to have been integrated in the latest release (1.19.2).

yf711 · 2024-10-24T22:08:25Z

Hi @frenetj thanks for the notice
I just found that my fix didn't make it to 1.19, but it will be included in the upcoming 1.20 release, which is targeted early next month.
You can also build from branch rel-1.20.0 and see if that works as expected in your case

github-actions bot added the ep:TensorRT issues related to TensorRT execution provider label Jul 30, 2024

jywu-msft assigned yf711 Jul 30, 2024

yf711 mentioned this issue Aug 5, 2024

[TensorRT EP] Add null_ptr check to avoid crash when running session which was failed to generate trt_engine previously #21621

Merged

frenetj closed this as completed Aug 8, 2024

frenetj reopened this Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash in TensorrtExecutionProvider when TensorRT EP fails to create engine from network #21567

Crash in TensorrtExecutionProvider when TensorRT EP fails to create engine from network #21567

frenetj commented Jul 30, 2024

yf711 commented Jul 31, 2024

frenetj commented Aug 1, 2024 •

edited

Loading

frenetj commented Aug 1, 2024

yf711 commented Aug 2, 2024

frenetj commented Aug 8, 2024 •

edited

Loading

frenetj commented Oct 24, 2024

yf711 commented Oct 24, 2024

Crash in TensorrtExecutionProvider when TensorRT EP fails to create engine from network #21567

Crash in TensorrtExecutionProvider when TensorRT EP fails to create engine from network #21567

Comments

frenetj commented Jul 30, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

yf711 commented Jul 31, 2024

frenetj commented Aug 1, 2024 • edited Loading

frenetj commented Aug 1, 2024

yf711 commented Aug 2, 2024

frenetj commented Aug 8, 2024 • edited Loading

frenetj commented Oct 24, 2024

yf711 commented Oct 24, 2024

frenetj commented Aug 1, 2024 •

edited

Loading

frenetj commented Aug 8, 2024 •

edited

Loading