-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in TensorrtExecutionProvider when TensorRT EP fails to create engine from network #21567
Comments
Could you try building ORT from this branch and see if this could stop from crashing? |
Hi Yifan, Thanks for the quick fix; it works perfectly! However, while compiling your branch with TensorRT 8.5.3, we got the following errors:
that we fixed by adding #if NV_TENSORRT_MAJOR >= 10 when trt_config->setHardwareCompatibilityLevel was called: diff --git a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc std::string cache_hw_compat = "sm" + compute_capability;
|
Note that Git's formatting is not showing the second part of the above comment properly. Please read it in standard text format. |
Hi @frenetj ORT starts to support TRT8.6 since 1.15 and add features incompatible to older TRT 8.x. |
Hello @yf711 Using TRT8.6 works perfectly with this fix. Thanks a lot! |
…which was failed to generate trt_engine previously (#21621) ### Description <!-- Describe your changes. --> Add null_ptr check to avoid crash when running session which was failed to generate trt_engine previously ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reported and verified by #21567
Hello @yf711, the fix doesn't seem to have been integrated in the latest release (1.19.2). |
Hi @frenetj thanks for the notice |
Describe the issue
When the TensorRT EP fails to create engine from network and the client calls run() again in the same session, the following crash occurs:
#0 0x00007efc5442df84 in nvinfer1::ICudaEngine::getNbIOTensors() const (this=0x0) at tensort/include/NvInferRuntime.h:2160 #1 0x00007efc54451cf8 in onnxruntime::TensorrtExecutionProvider::<lambda(onnxruntime::FunctionState, const OrtApi*, OrtKernelContext*)>::operator()(onnxruntime::FunctionState, const OrtApi *, OrtKernelContext *) const (__closure=0x7efbfb1d8098, state=0x7efbfc81bf80, api= 0x7f02b6d0b2e0 <ort_api_1_to_18>, context=0x7fff94d9ce50) at onnxruntime-1.18.0/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:3395 #2 0x00007efc54487e8c in std::_Function_handler<onnxruntime::common::Status(void*, const OrtApi*, OrtKernelContext*), onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph(const onnxruntime::GraphViewer&, const onnxruntime::Node&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::unordered_map<std::__cxx11::basic_string<char>, long unsigned int>&, std::vector<onnxruntime::NodeComputeInfo>&)::<lambda(onnxruntime::FunctionState, const OrtApi*, OrtKernelContext*)> >::_M_invoke(const std::_Any_data &, void *&&, const OrtApi *&&, OrtKernelContext *&&) (__functor=..., __args#0=@0x7fff94d9cbb8: 0x7efbfc81bf80, __args#1=@0x7fff94d9cbb0: 0x7f02b6d0b2e0 <ort_api_1_to_18>, __args#2=@0x7fff94d9cba8: 0x7fff94d9ce50) at /usr/include/c++/8/bits/std_function.h:283 #3 0x00007f02b59addac in std::function<onnxruntime::common::Status (void*, OrtApi const*, OrtKernelContext*)>::operator()(void*, OrtApi const*, OrtKernelContext*) const (this=0x7efbfb1d8098, __args#0=0x7efbfc81bf80, __args#1=0x7f02b6d0b2e0 <ort_api_1_to_18>, __args#2=0x7fff94d9ce50) at /usr/include/c++/8/bits/std_function.h:687 #4 0x00007f02b59a76b9 in onnxruntime::FunctionKernel::Compute(onnxruntime::OpKernelContext*) const (this=0x7efc014e2c00, context=0x7fff94d9ce50) at onnxruntime-1.18.0/onnxruntime/core/framework/func_kernel.h:52 #5 0x00007f02b5ac7d5c in onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (ctx=..., idx=4937, stream_idx=0, terminate_flag=@0x2716f308: false, session_scope=...) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:495 #6 0x00007f02b5abef4c in onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (this=0x3587a8e0, ctx=..., stream_idx=0, session_scope=..., terminate_flag=@0x2716f308: false, continue_flag=@0x7fff94d9d51f: true) at onnxruntime-1.18.0/onnxruntime/core/framework/execution_steps.cc:73 #7 0x00007f02b5acb5a3 in onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (stream_idx=0, ctx=..., session_scope=..., terminate_flag=@0x2716f308: false, since=0) at onnxruntime-1.18.0/onnxruntime/core/framework/stream_execution_context.cc:222 #8 0x00007f02b5ac827b in onnxruntime::<lambda()>::operator()(void) const (__closure=0x7efc017dc3b0) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:589 #9 0x00007f02b5ac992f in std::_Function_handler<void(), onnxruntime::ExecuteThePlan(const onnxruntime::SessionState&, gsl::span<int const>, gsl::span<const OrtValue>, gsl::span<int const>, std::vector<OrtValue>&, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)> >&, const onnxruntime::logging::Logger&, const onnxruntime::DeviceStreamCollection*, bool const&, bool, bool)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #10 0x00007f02b4e39dac in std::function<void ()>::operator()() const (this=0x7fff94d9dbf0) at /usr/include/c++/8/bits/std_function.h:687 #11 0x00007f02b4e1ad49 in onnxruntime::concurrency::ThreadPool::Schedule(onnxruntime::concurrency::ThreadPool*, std::function<void ()>) (tp=0x0, fn=...) at onnxruntime-1.18.0/include/onnxruntime/core/platform/threadpool.h:233 #12 0x00007f02b5ac8608 in onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) (session_state=..., feed_mlvalue_idxs=..., feeds=..., fetch_mlvalue_idxs=..., fetches=std::vector of length 2, capacity 2 = {...}, fetch_allocators=std::unordered_map with 0 elements, logger=..., device_streams=0x1dbb3080, terminate_flag=@0x2716f308: false, only_execute_path_to_fetches=false, single_thread_mode=true) at onnxruntime-1.18.0/onnxruntime/core/framework/sequential_executor.cc:588 #13 0x00007f02b5a68157 in onnxruntime::utils::ExecuteGraphImpl(const onnxruntime::SessionState &, const onnxruntime::FeedsFetchesManager &, gsl::span<OrtValue const, 18446744073709551615>, std::vector<OrtValue, std::allocator<OrtValue> > &, const std::unordered_map<long unsigned int, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)>, std::hash<long unsigned int>, std::equal_to<long unsigned int>, std::allocator<std::pair<long unsigned int const, std::function<onnxruntime::common::Status(const onnxruntime::TensorShape&, const OrtDevice&, OrtValue&, bool&)> > > > &, ExecutionMode, const bool &, const onnxruntime::logging::Logger &, onnxruntime::DeviceStreamCollection *, bool, onnxruntime::Stream *) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, fetch_allocators=std::unordered_map with 0 elements, execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x2716f308: false, logger=..., device_stream_collection=0x1dbb3080, only_execute_path_to_fetches=false, parent_stream=0x0) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:706 #14 0x00007f02b5a6878e in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, execution_mode=ORT_SEQUENTIAL, terminate_flag=@0x2716f308: false, logger=..., device_stream_collection_holder=..., only_execute_path_to_fetches=false, parent_stream=0x0) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:755 #15 0x00007f02b5a68868 in onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) (session_state=..., feeds_fetches_manager=..., feeds=..., fetches=std::vector of length 2, capacity 2 = {...}, execution_mode=ORT_SEQUENTIAL, run_options=..., device_stream_collection_holder=..., logger=...) at onnxruntime-1.18.0/onnxruntime/core/framework/utils.cc:782 #16 0x00007f02b4e33fd5 in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 18446744073709551615ul>, std::vector<OrtValue, std::allocator<OrtValue> >*, std::vector<OrtDevice, std::allocator<OrtDevice> > const*) (this=0x23f71cf0, run_options=..., feed_names=..., feeds=..., output_names=..., p_fetches=0x7fff94d9f1f0, p_fetches_device_info=0x0) at onnxruntime-1.18.0/onnxruntime/core/session/inference_session.cc:2531 #17 0x00007f02b4e351bc in onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue const* const, 18446744073709551615ul>, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue*, 18446744073709551615ul>) (this=0x23f71cf0, run_options=..., feed_names=..., feeds=..., fetch_names=..., fetches=...) at onnxruntime-1.18.0/onnxruntime/core/session/inference_session.cc:2659 #18 0x00007f02b4d42116 in OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) (sess=0x23f71cf0, run_options=0x2716f2e0, input_names=0x1b75aff0, input=0x7efc5550bba0, input_len=2, output_names=0x1dea9570, output_names_len=2, output=0x7efbf802c200) at onnxruntime-1.18.0/onnxruntime/core/session/onnxruntime_c_api.cc:831
To reproduce
Run inference on a model that is too large to be cached (or force return of the following error "TensorRT EP failed to create engine from network." in the TensorRT EP.
Try running the inference again on the same session.
--> crash
Urgency
No response
Platform
Linux
OS Version
ROCKY 8.5 (gcc-11.2.1, c++17)
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
C
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8
The text was updated successfully, but these errors were encountered: