[#6 Attention Performance] extend attention support for Causal = True #1102

Dewei-Wang-sh · 2024-05-13T09:11:44Z

No description provided.

jopperm · 2024-07-29T11:44:25Z

Update: I'm actively working on this. What I've found out so far is that activating causal masking affects the propagation of layout information in the kernel function. There are now two dependent loops, and the current block's indices are evaluated in the computation, so we need to support additional operations (tt.make_range, arith.select and arith.cmpi) related to the mask computation.

jopperm · 2024-08-05T12:30:28Z

Update: Still WIP. Locally, I have extended the perf-attn branch enough to reach LLVM-IR generation, though there are a couple of illegal instruction being created, which I believe originate from the TTGIR-level.

jopperm · 2024-08-12T11:41:03Z

Update: Still WIP. I'm investigating how to handle "row vectors" (e.g. tensor<1x64>) correctly.

jopperm · 2024-08-19T12:30:15Z

WIP.

jopperm · 2024-08-26T12:15:40Z

Update: PR is up, still working on tests.

jopperm · 2024-09-02T13:17:36Z

Update: The linked PRs are under review.

jopperm · 2024-09-06T15:14:09Z

Update (for 9/9/24): Two outstanding PRs to land, then I need to check if E2E execution works on llvm-target.

whitneywhtsang · 2024-09-06T15:25:55Z

Update (for 9/9/24): Two outstanding PRs to land, then I need to check if E2E execution works on llvm-target.

FYI, flash attention is not yet functional on llvm-target, you can apply #2061 when testing.

jopperm · 2024-09-16T12:56:48Z

Update: I'm investigating an outstanding issue with the layout propagation at the beginning of the advanced path.

jopperm · 2024-09-23T12:15:06Z

Update: Unchanged, still investigating

jopperm · 2024-09-30T13:20:55Z

Update: Still working on this.

Dewei-Wang-sh mentioned this issue May 13, 2024

[Attention Performance] Flash Attention performance get to 80%~90% of XeTLA #773

Open

vlad-penkin added the performance label May 13, 2024

vlad-penkin added this to the 4.0 [Performance] Core milestone May 13, 2024

vlad-penkin assigned Dewei-Wang-sh Jun 3, 2024

vlad-penkin added the enhancement New feature or request label Jun 6, 2024

jopperm assigned jopperm and unassigned Dewei-Wang-sh Jul 16, 2024

jopperm linked a pull request Aug 20, 2024 that will close this issue

Support causal masking in FlashAttention #1947

Draft

This was linked to pull requests Aug 27, 2024

[TT-TO-TTGWARP] Detect and handle flash attention with causal masking #2013

Merged

Distribute tt.make_range to warps #2026

Merged

etiotto mentioned this issue Aug 19, 2024

[Performance] Create issues backlog for generalizing flash attention performance optimizations #1909

Closed

This was linked to pull requests Aug 29, 2024

[MatchTargetSize] Handle tt.make_range and row-vector broadcasts #2043

Merged

[MatchTargetSize] Extend for-loop canonicalization pattern #2045

Merged

[TTIG-TO-LLVM] Support row-vector broadcasts and make_range #2046

Merged

whitneywhtsang closed this as completed in #2026 Aug 30, 2024

jopperm reopened this Aug 30, 2024

whitneywhtsang closed this as completed in #2013 Sep 4, 2024

jopperm reopened this Sep 5, 2024

Dewei-Wang-sh closed this as completed in #2045 Sep 9, 2024

jopperm reopened this Sep 11, 2024

jopperm closed this as completed in #2046 Sep 12, 2024

jopperm reopened this Sep 16, 2024

jopperm linked a pull request Oct 4, 2024 that will close this issue

Support causal flash attention #2425

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#6 Attention Performance] extend attention support for Causal = True #1102

[#6 Attention Performance] extend attention support for Causal = True #1102

Dewei-Wang-sh commented May 13, 2024

jopperm commented Jul 29, 2024

jopperm commented Aug 5, 2024

jopperm commented Aug 12, 2024

jopperm commented Aug 19, 2024

jopperm commented Aug 26, 2024

jopperm commented Sep 2, 2024

jopperm commented Sep 6, 2024

whitneywhtsang commented Sep 6, 2024

jopperm commented Sep 16, 2024

jopperm commented Sep 23, 2024

jopperm commented Sep 30, 2024

[#6 Attention Performance] extend attention support for Causal = True #1102

[#6 Attention Performance] extend attention support for Causal = True #1102

Comments

Dewei-Wang-sh commented May 13, 2024

jopperm commented Jul 29, 2024

jopperm commented Aug 5, 2024

jopperm commented Aug 12, 2024

jopperm commented Aug 19, 2024

jopperm commented Aug 26, 2024

jopperm commented Sep 2, 2024

jopperm commented Sep 6, 2024

whitneywhtsang commented Sep 6, 2024

jopperm commented Sep 16, 2024

jopperm commented Sep 23, 2024

jopperm commented Sep 30, 2024