-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TensorRT EP] Segmentation fault when concurrently loading model using TensorRT EP #20089
Labels
ep:TensorRT
issues related to TensorRT execution provider
Comments
github-actions
bot
added
the
ep:TensorRT
issues related to TensorRT execution provider
label
Mar 26, 2024
@tanmayv25 Thanks for raising this issue. |
chilo-ms
added a commit
that referenced
this issue
Mar 27, 2024
The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its static variables, `created_custom_op_list` and `custom_op_domain`. This PR makes sure synchronization using mutex. see issue: #20089
YUNQIUGUO
pushed a commit
that referenced
this issue
Mar 27, 2024
The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its static variables, `created_custom_op_list` and `custom_op_domain`. This PR makes sure synchronization using mutex. see issue: #20089
@chilo-ms I can confirm that the linked PR has fixed the issue. Thanks a lot! |
@tanmayv25, thanks for verifying. |
TedThemistokleous
pushed a commit
to TedThemistokleous/onnxruntime
that referenced
this issue
May 7, 2024
…#20093) The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its static variables, `created_custom_op_list` and `custom_op_domain`. This PR makes sure synchronization using mutex. see issue: microsoft#20089
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the issue
There seems to be a regression in ONNXRUNTIME library within ORT backend for Triton Inference Server when using TensorRT execution provider.
We started observing a segmentation fault coming from some memory corruption when trying to load multiple session of the model concurrently. The failing test is specifically: L0_onnx_optimization.
I have also written a small reproducer that uses C API to load the models similar to how the models are loaded in Triton's ONNX runtime backend.
ort_trt_test.cc
Test Combinations and Results
The first argument of the binary describes how many ort sessions will be loaded on the GPU.
The second argument sets whether or not to load these sessions concurrently: 0 means the sessions will be loaded concurrently while >0 means the sessions are loaded one at a time.
Additionally, the backtrace of the segmentation fault is:
To reproduce
Compile the described reproducer(ort_trt_test) and execute it with mentioned CLI options.
Urgency
The regression is quite serious and impact users in production environment.
Platform
Linux
OS Version
5.15.0-89-generic
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.17.2
ONNX Runtime API
C
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
8.6.3.1+cuda12.2.2.009
The text was updated successfully, but these errors were encountered: