[Performance] High thread contention in BFCArena #21916
Labels
core runtime
issues related to core runtime
performance
issues related to performance regressions
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
Hi,
I've noticed that a significant chunk of time is spent on locks inside
onnxruntime
. Specifically, insideBFCArena::AllocateRawInternal
onnxruntime/onnxruntime/core/framework/bfc_arena.cc
Line 328 in 0167338
The conditions are as follows:
Session
object in the whole applicationSession.Run
at the same timeintra_threads
andinter_threads
set 1,execution_mode
set toSEQUENTIAL
, arena allocator enabled, memory pattern optimization enabledSee flamegraph screenshots below:
strace
shows that 92% of the application time is spent infutex
calls:Is this an expected
BFCArena
limitation, or is it something misconfigured on my side?I'm expecting that having a
Session
object per worker thread should eliminate contention. However, I've seen developers here discourage people from setups like this. Why? What are the drawbacks? I'm assuming increased memory consumption (this is fine for me), anything else?And if that is indeed an expected limitation, then, I'd say this needs some improvement. For example, a caller could pass their own
BFCArena
instance toSession.Run()
, orBFCArena
could track eachthread_id
and keep an array of arenas per each thread.To reproduce
Initialize a single
Session
with the following settings:intra_threads
set to 1inter_threads
set to 1execution_mode
set toSEQUENTIAL
Then, call
Session.Run
from many threads concurrently.Urgency
No response
Platform
Linux
OS Version
NixOS, Gentoo
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: