Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is InferenceSession.Run thread-safe? #114

Closed
hanabi1224 opened this issue Dec 6, 2018 · 9 comments
Closed

Is InferenceSession.Run thread-safe? #114

hanabi1224 opened this issue Dec 6, 2018 · 9 comments

Comments

@hanabi1224
Copy link

Is it true that i can keep a single instance for 1 model and call Run method concurrently with no problem? Or should I lock around Run or make a pool of InferenceSession?

@pranavsharma
Copy link
Contributor

It's safe to invoke Run() on the same session object in multiple threads. No need for any external synchronization. This aspect is documented in the design doc. https://github.com/Microsoft/onnxruntime/blob/master/docs/HighLevelDesign.md

@hanabi1224
Copy link
Author

@pranavsharma Thanks! Forgot to mention the scope of my question was dotnet nuget package, just want to double confirm it's still true (assuming it's just a slim wrapper of c api)

@yufenglee
Copy link
Member

Yes, it also true for C#.

@harrysummer
Copy link
Contributor

Hi, @pranavsharma , sorry for jumping in the thread, but I am having concern on the thread safety of MKL-DNN execution provider. As this comment mentioned, MKL-DNN has to define MKLDNN_ENABLE_CONCURRENT_EXEC to make it thread-safe. But I didn't see this in ONNX Runtime.

@jomach
Copy link

jomach commented Mar 9, 2020

I'm trying to do inferencing with spark and I'm getting the error:

2020-03-09 15:31:55.000263279 [E:onnxruntime:ort-java, cuda_call.cc:103 CudaCall] CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=f7d791cffbe6 ; expr=cudnnSetTensorNdDescriptor(tensor_, dataType, static_cast<int>(rank), dims.data(), strides.data());
2020-03-09 15:31:55.000318077 [E:onnxruntime:, sequential_executor.cc:183 Execute] Non-zero status code returned while running ConvTranspose node. Name:'6814' Status Message: CUDNN error executing cudnnSetTensorNdDescriptor(tensor_, dataType, static_cast<int>(rank), dims.data(), strides.data())

running it on a single thread Seems to work... Not sure if this has something todo with Thread safety, it only fails one time every 180 images ....

@r0l1
Copy link

r0l1 commented Jan 25, 2024

This is not more true. We experienced concurrency bugs with the tensorRT runtime provider, if there is no mutex lock around the session Run call.

@pranavsharma
Copy link
Contributor

This is not more true. We experienced concurrency bugs with the tensorRT runtime provider, if there is no mutex lock around the session Run call.

cc @jywu-msft

@jywu-msft
Copy link
Member

jywu-msft commented Jan 25, 2024

This is not more true. We experienced concurrency bugs with the tensorRT runtime provider, if there is no mutex lock around the session Run call.

@r0l1 which version of OnnxRuntime/TensorRT EP did you encounter this on? (did you build from source or use a prebuilt package) There was a concurrency bug/regression that was fixed a few months ago. wanted to confirm you are no longer encountering the issue with the latest versions.

@r0l1
Copy link

r0l1 commented Jan 25, 2024

@jywu-msft thank you for the fast response. I opened a new issue here: #19275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants