Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

com.microsoft.MultiHeadAttention is unsupported #196

Open
music-dino opened this issue Oct 24, 2024 · 5 comments
Open

com.microsoft.MultiHeadAttention is unsupported #196

music-dino opened this issue Oct 24, 2024 · 5 comments
Assignees

Comments

@music-dino
Copy link

ROCm#3425

@marko-fabo-htec
Copy link

The computation details of the Multi-Head Attention can be found in this paper:
https://arxiv.org/abs/1706.03762

@marko-fabo-htec
Copy link

An example about how to implement the behavior of the MultiHeadAttention operator:
microsoft/onnxruntime#19924

Useful articles about Transformers and Attention:
https://towardsdatascience.com/transformers-explained-visually-part-1-overview-of-functionality-95a6dd460452

@marko-fabo-htec
Copy link

PR: ROCm#3650

@marko-fabo-htec
Copy link

benchmark results:

               |     REF     |     CPU     |     GPU     |
mha            | 0.539452 ms | 0.841959 ms | 0.270837 ms |
mha_cross      | 0.440712 ms | 0.753624 ms | 0.285106 ms |
mha_kv_packed  | 1.121980 ms | 1.454690 ms | 0.268043 ms |
mha_qkv_packed | 1.375370 ms | 1.740890 ms | 0.264313 ms |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants