Support causal flash attention #2425

jopperm · 2024-10-04T09:22:05Z

This PR adds support for causal FA:

Keeps encoding on row-vector tensor operations, as must be left untouched when lowering to the SIMT program.
Extends the pattern matching helper that determines whether a tensor is transposed, to look through advance operations. (The second attention loop uses a transposed tensor pointer that is tt.advance'd between the loops.)

Signed-off-by: Julian Oppermann <[email protected]>

jopperm · 2024-10-04T09:37:47Z

For D_HEAD=128, we're getting tt.make_range ops that are smaller than the subgroup size; ~~I don't know how to lower these yet.~~ Checked codegen, seems that no special handling is needed.

(Almost) support causal FA

f63d13b

Signed-off-by: Julian Oppermann <[email protected]>

jopperm requested review from whitneywhtsang, etiotto, Dewei-Wang-sh and a team October 4, 2024 09:22

jopperm self-assigned this Oct 4, 2024

jopperm linked an issue Oct 4, 2024 that may be closed by this pull request

[#6 Attention Performance] extend attention support for Causal = True #1102

Open

Relax make_range lowering

5b7ffa7

jopperm changed the title ~~(Almost) support causal flash attention~~ Support causal flash attention Oct 4, 2024

Add missing parentheses

bd18921

victor-eds approved these changes Oct 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support causal flash attention #2425

Support causal flash attention #2425

jopperm commented Oct 4, 2024 •

edited

Loading

jopperm commented Oct 4, 2024 •

edited

Loading

Support causal flash attention #2425

Are you sure you want to change the base?

Support causal flash attention #2425

Conversation

jopperm commented Oct 4, 2024 • edited Loading

jopperm commented Oct 4, 2024 • edited Loading

jopperm commented Oct 4, 2024 •

edited

Loading

jopperm commented Oct 4, 2024 •

edited

Loading