-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue when converting Whisper using --collect_cross_qk on CPU #18216
Comments
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
@shubhambhokare1 @kunal-vaishnavi maybe this is something you are familiar with |
I was trying to look a little further into it. It seems that Cross QK support requires the model to be compiled with DecoderMaskedMultiHeadAttention, which is implemented only for CUDA. |
Cross QK support is added in this PR as part of the
DecoderMaskedMultiHeadAttention is used specifically for CUDA. Cross QK support can be added to other attention ops that run on CPU. |
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Describe the issue
I am currently using the nightly build of the ONNX runtime to convert Whisper to ONNX. I am specifically interested in getting the cross QK of the model, to be used eventually for timestamps. I am trying to convert the model to run on CPU. My issue is that, when I convert it, a runtime exception occurs:
When looking into it, it seems that DecoderMaskedMultiHeadAttention is only used when the --use_gpu flag is enabled, as is cross QK.
Is there any way I can build and run this model on the CPU?
To reproduce
Here is the exact command I ran. Use the latest version of ONNX Runtime.
Urgency
No response
Platform
Windows
OS Version
Windows 10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
#18206
ONNX Runtime API
Other / Unknown
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: