Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in TensorrtExecutionProvider when TensorRT EP fails to create engine from network #21567

Open
frenetj opened this issue Jul 30, 2024 · 7 comments
Assignees
Labels
ep:TensorRT issues related to TensorRT execution provider

Comments

@frenetj
Copy link

frenetj commented Jul 30, 2024

Describe the issue

When the TensorRT EP fails to create engine from network and the client calls run() again in the same session, the following crash occurs:

#0 0x00007efc5442df84 in nvinfer1::ICudaEngine::getNbIOTensors() const (this=0x0) at tensort/include/NvInferRuntime.h:2160 #1 0x00007efc54451cf8 in onnxruntime::TensorrtExecutionProvider::<lambda(onnxruntime::FunctionState, const OrtApi*, OrtKernelContext*)>::operator()(onnxruntime::FunctionState, const OrtApi *, OrtKernelContext *) const (__closure=0x7efbfb1d8098, state=0x7efbfc81bf80, api= 0x7f02b6d0b2e0 <ort_api_1_to_18>, context=0x7fff94d9ce50) at onnxruntime-1.18.0/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3395 #2 0x00007efc54487e8c in std::_Function_handler<onnxruntime::common::Status(void*, const OrtApi*, OrtKernelContext*), onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const onnxruntime::GraphViewer&, const onnxruntime::Node&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::vector<onnxruntime::NodeComputeInfo>&)::<lambda(onnxruntime::FunctionState, const OrtApi*, OrtKernelContext*)> >::_M_invoke(const std::_Any_data &, void *&&, const OrtApi *&&, OrtKernelContext *&&) (__functor=..., __args#0=@0x7fff94d9cbb8: 0x7efbfc81bf80, __args#1=@0x7fff94d9cbb0: 0x7f02b6d0b2e0 <ort_api_1_to_18>, __args#2=@0x7fff94d9cba8: 0x7fff94d9ce50) at /usr/include/c++/8/bits/std_function.h:283 #3 0x00007f02b59addac in std::function<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*)>::operator()(void*, OrtApi const*, OrtKernelContext*) const (this=0x7efbfb1d8098, __args#0=0x7efbfc81bf80, __args#1=0x7f02b6d0b2e0 <ort_api_1_to_18>, __args#2=0x7fff94d9ce50) at /usr/include/c++/8/bits/std_function.h:687 #4 0x00007f02b59a76b9 in onnxruntime::FunctionKernel::Compute(onnxruntime::OpKernelContext*) const (this=0x7efc014e2c00, context=0x7fff94d9ce50) at onnxruntime-1.18.0/onnxruntime/core/framework/func_kernel.h:52 #5 0x00007f02b5ac7d5c in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (ctx=..., idx=4937, stream_idx=0, terminate_flag=@0x2716f308: false, session_scope=...) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:495 #6 0x00007f02b5abef4c in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (this=0x3587a8e0, ctx=..., stream_idx=0, session_scope=..., terminate_flag=@0x2716f308: false, continue_flag=@0x7fff94d9d51f: true) at onnxruntime-1.18.0/onnxruntime/core/framework/execution_steps.cc:73 #7 0x00007f02b5acb5a3 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (stream_idx=0, ctx=..., session_scope=..., terminate_flag=@0x2716f308: false, since=0) at onnxruntime-1.18.0/onnxruntime/core/framework/stream_execution_context.cc:222 #8 0x00007f02b5ac827b in onnxruntime::<lambda()>::operator()(void) const (__closure=0x7efc017dc3b0) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:589 #9 0x00007f02b5ac992f in std::_Function_handler<void(), onnxruntime::ExecuteThePlan(const onnxruntime::SessionState&, gsl::span<int const>, gsl::span<const OrtValue>, gsl::span<int const>, std::vector<OrtValue>&, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)> >&, const onnxruntime::logging::Logger&, const onnxruntime::DeviceStreamCollection*, bool const&, bool, bool)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #10 0x00007f02b4e39dac in std::function<void ()>::operator()() const (this=0x7fff94d9dbf0) at /usr/include/c++/8/bits/std_function.h:687 #11 0x00007f02b4e1ad49 in onnxruntime::concurrency::ThreadPool::Schedule(onnxruntime::concurrency::ThreadPool*, std::function<void ()>) (tp=0x0, fn=...) at onnxruntime-1.18.0/include/onnxruntime/core/platform/threadpool.h:233 #12 0x00007f02b5ac8608 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) (session_state=..., feed_mlvalue_idxs=..., feeds=..., fetch_mlvalue_idxs=..., fetches=std::vector of length 2, capacity 2 = {...}, fetch_allocators=std::unordered_map with 0 elements, logger=..., device_streams=0x1dbb3080, terminate_flag=@0x2716f308: false, only_execute_path_to_fetches=false, single_thread_mode=true) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:588 #13 0x00007f02b5a68157 in onnxruntime::utils::ExecuteGraphImpl(const onnxruntime::SessionState &, const onnxruntime::FeedsFetchesManager &, gsl::span<OrtValue const, 18446744073709551615>, std::vector<OrtValue, std::allocator<OrtValue> > &, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)>, std::hash<long unsigned int>, std::equal_to<long unsigned int>, std::allocator<std::pair<long unsigned int const, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)> > > > &, ExecutionMode, const bool &, const onnxruntime::logging::Logger &, onnxruntime::DeviceStreamCollection *, bool, onnxruntime::Stream *) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, fetch_allocators=std::unordered_map with 0 elements, execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x2716f308: false, logger=..., device_stream_collection=0x1dbb3080, only_execute_path_to_fetches=false, parent_stream=0x0) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:706 #14 0x00007f02b5a6878e in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x2716f308: false, logger=..., device_stream_collection_holder=..., only_execute_path_to_fetches=false, parent_stream=0x0) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:755 #15 0x00007f02b5a68868 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, execution_mode=ORT_SEQUENTIAL, run_options=..., device_stream_collection_holder=..., logger=...) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:782 #16 0x00007f02b4e33fd5 in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >*, std::vector<OrtDevice, std::allocator<OrtDevice> > const*) (this=0x23f71cf0, run_options=..., feed_names=..., feeds=..., output_names=..., p_fetches=0x7fff94d9f1f0, p_fetches_device_info=0x0) at onnxruntime-1.18.0/onnxruntime/core/session/inference_session.cc:2531 #17 0x00007f02b4e351bc in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue const* const, 18446744073709551615ul>, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue*, 18446744073709551615ul>) (this=0x23f71cf0, run_options=..., feed_names=..., feeds=..., fetch_names=..., fetches=...) at onnxruntime-1.18.0/onnxruntime/core/session/inference_session.cc:2659 #18 0x00007f02b4d42116 in OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) (sess=0x23f71cf0, run_options=0x2716f2e0, input_names=0x1b75aff0, input=0x7efc5550bba0, input_len=2, output_names=0x1dea9570, output_names_len=2, output=0x7efbf802c200) at onnxruntime-1.18.0/onnxruntime/core/session/onnxruntime_c_api.cc:831

To reproduce

Run inference on a model that is too large to be cached (or force return of the following error "TensorRT EP failed to create engine from network." in the TensorRT EP.
Try running the inference again on the same session.
--> crash

Urgency

No response

Platform

Linux

OS Version

ROCKY 8.5 (gcc-11.2.1, c++17)

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.8

@github-actions github-actions bot added the ep:TensorRT issues related to TensorRT execution provider label Jul 30, 2024
@yf711
Copy link
Contributor

yf711 commented Jul 31, 2024

Could you try building ORT from this branch and see if this could stop from crashing?

@frenetj
Copy link
Author

frenetj commented Aug 1, 2024

Hi Yifan,

Thanks for the quick fix; it works perfectly!

However, while compiling your branch with TensorRT 8.5.3, we got the following errors:

/git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc: In member function 'onnxruntime::common::Status onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const onnxruntime::GraphViewer&, const onnxruntime::Node&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::vector<onnxruntime::NodeComputeInfo>&)': /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3055:17: error: 'class nvinfer1::IBuilderConfig' has no member named 'setHardwareCompatibilityLevel' 3055 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3055:57: error: 'nvinfer1::HardwareCompatibilityLevel' has not been declared 3055 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc: In lambda function: /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3644:21: error: 'class nvinfer1::IBuilderConfig' has no member named 'setHardwareCompatibilityLevel' 3644 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3644:61: error: 'nvinfer1::HardwareCompatibilityLevel' has not been declared 3644 | trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ gmake[2]: *** [CMakeFiles/onnxruntime_providers_tensorrt.dir/build.make:146: CMakeFiles/onnxruntime_providers_tensorrt.dir/git/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:2267: CMakeFiles/onnxruntime_providers_tensorrt.dir/all] Error 2

that we fixed by adding #if NV_TENSORRT_MAJOR >= 10 when trt_config->setHardwareCompatibilityLevel was called:

diff --git a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
index 2df4611743..b1e7147ea1 100644
--- a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
+++ b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
@@ -3051,12 +3051,13 @@ Status TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const GraphView

std::string cache_hw_compat = "sm" + compute_capability;
// Enable hardware compatility mode if assigned
+#if NV_TENSORRT_MAJOR >= 10
if (engine_cache_enable_ && engine_hw_compatible_) {
trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS);
cache_hw_compat = "_sm80+";
LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] Hardware compatibility is enabled when loading and capturing engine cache.";
}

+#endif
// Name the engine cache based on GPU compute capacity and reduce the chance of loading an incompatible cache
// Note: Engine cache generated on a GPU with large memory might not be loadable on a GPU with smaller memory, even if they share the same compute capacity
const std::string cache_path_prefix = cache_path + cache_hw_compat;
@@ -3639,12 +3640,13 @@ Status TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const GraphView
}
}

+#if NV_TENSORRT_MAJOR >= 10
// Enable hardware compatility mode if assigned
if (trt_state->engine_hw_compatible) {
trt_config->setHardwareCompatibilityLevel(nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS);
LOGS_DEFAULT(INFO) << "[TensorRT EP] Re-generate engine with hardware compatibility enabled.";
}

+#endif
// Build engine
std::unique_ptrnvinfer1::IHostMemory serialized_engine;
{

Would it be possible for you to also make this change?

@frenetj
Copy link
Author

frenetj commented Aug 1, 2024

Note that Git's formatting is not showing the second part of the above comment properly. Please read it in standard text format.

@yf711
Copy link
Contributor

yf711 commented Aug 2, 2024

Hi @frenetj ORT starts to support TRT8.6 since 1.15 and add features incompatible to older TRT 8.x.
Please find TRT version requirement https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html
We recommend using latest TRT10.x, as ORT will gradually stop supporting TRT8.6 in future

@frenetj
Copy link
Author

frenetj commented Aug 8, 2024

Hello @yf711 Using TRT8.6 works perfectly with this fix. Thanks a lot!

@frenetj frenetj closed this as completed Aug 8, 2024
yf711 added a commit that referenced this issue Aug 9, 2024
…which was failed to generate trt_engine previously (#21621)

### Description
<!-- Describe your changes. -->
Add null_ptr check to avoid crash when running session which was failed
to generate trt_engine previously


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Reported and verified by
#21567
@frenetj
Copy link
Author

frenetj commented Oct 24, 2024

Hello @yf711, the fix doesn't seem to have been integrated in the latest release (1.19.2).

@frenetj frenetj reopened this Oct 24, 2024
@yf711
Copy link
Contributor

yf711 commented Oct 24, 2024

Hi @frenetj thanks for the notice
I just found that my fix didn't make it to 1.19, but it will be included in the upcoming 1.20 release, which is targeted early next month.
You can also build from branch rel-1.20.0 and see if that works as expected in your case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:TensorRT issues related to TensorRT execution provider
Projects
None yet
Development

No branches or pull requests

2 participants