You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in my bert model,when i use head-size == 32,the attention cuda kernel will cause ort codedump,the error msg says “cuda illegal memory access was encountered”.
i find the reason is the FusedMHARunnerFP16v2 dose not support concurrent running.
@zwyao,
The thread-safe for self attention FusedMHARunnerFP16v2 was fixed in #21420. There was another fix for cross-attention.
The bug was resolved in 1.19.0 release. Please try 1.19.2.
@zwyao, The thread-safe for self attention FusedMHARunnerFP16v2 was fixed in #21420. There was another fix for cross-attention. The bug was resolved in 1.19.0 release. Please try 1.19.2.
Describe the issue
in my bert model,when i use head-size == 32,the attention cuda kernel will cause ort codedump,the error msg says “cuda illegal memory access was encountered”.
i find the reason is the FusedMHARunnerFP16v2 dose not support concurrent running.
To reproduce
attention_bug_fix.txt
this is my fix code
Urgency
No response
Platform
Linux
OS Version
1.18.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.0 master
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: