-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] In ONNX Runtime, the CPU consumption does not scale linearly with the number of threads #19384
Comments
As you only have 4 cores, why do you create 16 threads? |
First, you're using a version of ORT that is 4 releases old. Second, as Yufeng said above, it's not clear why you've 16 threads on a 4 core machine. What is rtf? |
Thanks for your reply. Besides, I found that when I use docker to creat a few container to run my onnxruntime program in different processor with differe cpu core id. As the number of containers increases, the CPU load among these containers will mutually influence each other. When I have two container, the cpu usage of every containe is around 80%, when have three container, the cpu usage of every container is 90%, when I have 4, the cpu usage is round 100%. @pranavsharma RTF(Real Time Factor) = total_audio / total_time_taken, which is served as a performance evaluation metric. The lower RTF, the better performance. |
@yufenglee Hi, We measured the CPU cycles of Are there any Ort configuration options that can eliminate this impact between containers? |
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Hi, before this issue gets closed. |
In my situation, it's due to NUMA architecture. Session option may help, such as enable_spinning_lock. |
thanks @poor1017 that was a great hint. |
Hi, |
Hello, I have meet a problem in C++ onnxruntime。
The program has only one onnx model, when the threads up, the program will creat a new session->run(). In the program, I found that when I have 4 threads to deal with the 4 requests , it cost 1cpu with rft 1.0.
When limiting the CPU cores to 4 and using 16 threads to handle 16 requests, the RTF ranges from 2.19 -3.7, the avarage rtf is around 3.2.
the session options is :
session_options_.SetIntraOpNumThreads(1);
Refer the issue: OnnxRuntime multithreading efficiency is poor I change the session option to
session_options_.SetIntraOpNumThreads(1); session_options_.SetInterOpNumThreads(1); session_options_.DisableMemPattern(); session_options_.SetExecutionMode(ORT_SEQUENTIAL);
The avarage rtf is aroud 2.4.
The deadline is looming, and time is running out for me 😢
How can I further optimize to achieve a more linear relationship between CPU consumption and concurrency? The ideal RTF is roud 1.0.(16 threads to handle 16 requests with 4 cpu)
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Version or Commit ID
1.12.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Tasks
The text was updated successfully, but these errors were encountered: