Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Attention Bias Broadcast Support #21710

Merged
merged 14 commits into from
Aug 16, 2024
Merged

Extend Attention Bias Broadcast Support #21710

merged 14 commits into from
Aug 16, 2024

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Aug 12, 2024

Description

Previously, MultiHeadAttention supports relative position bias of shape [1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention supports [1, N, S, T]. This will extend the support to allow [1, N, S, T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs.

  • Rename the input of "relative position bias" to "attention bias" because it can also be used for other types of bias, like ALiBi (Attention with Linear Biases) or attention mask.
  • Update unfused kernel to support broadcasting 2nd dimension of attention bias.
  • Update efficient attention to support broadcasting 2nd dimension of attention bias.
  • Update operators (MultiHeadAttention, DecoderMaskedMultiHeadAttention, Attention, PackedAttention, PackedMultiHeadAttention) to support broadcast attention bias on CUDA and CPU EPs.
  • Update ROCm, DML and WebGPU naming to be consistent. (Note that those EPs do not support broadcasting attention_bias for now).
  • Add attention bias tests for MultiHeadAttention.
  • Update operator documents
  • Update benchmark script

Other changes:

  • Fix some checks in multihead-attention.ts
  • Add helper functions to dump tensors given dimensions.

Motivation and Context

@tianleiwu tianleiwu marked this pull request as draft August 12, 2024 19:32
@tianleiwu tianleiwu marked this pull request as ready for review August 15, 2024 15:09
@tianleiwu tianleiwu requested a review from a team as a code owner August 15, 2024 15:09
@tianleiwu tianleiwu requested a review from wangyems August 16, 2024 15:52
@tianleiwu tianleiwu merged commit d79e3c5 into main Aug 16, 2024
95 of 97 checks passed
@tianleiwu tianleiwu deleted the tlwu/mha_attn_bias branch August 16, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants