Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6 Attention Performance] extend attention support for Causal = True #1102

Open
Dewei-Wang-sh opened this issue May 13, 2024 · 11 comments · Fixed by #2013, #2026, #2043, #2045 or #2046
Open

[#6 Attention Performance] extend attention support for Causal = True #1102

Dewei-Wang-sh opened this issue May 13, 2024 · 11 comments · Fixed by #2013, #2026, #2043, #2045 or #2046
Assignees
Labels
enhancement New feature or request performance

Comments

@Dewei-Wang-sh
Copy link
Contributor

No description provided.

@jopperm
Copy link
Contributor

jopperm commented Jul 29, 2024

Update: I'm actively working on this. What I've found out so far is that activating causal masking affects the propagation of layout information in the kernel function. There are now two dependent loops, and the current block's indices are evaluated in the computation, so we need to support additional operations (tt.make_range, arith.select and arith.cmpi) related to the mask computation.

@jopperm
Copy link
Contributor

jopperm commented Aug 5, 2024

Update: Still WIP. Locally, I have extended the perf-attn branch enough to reach LLVM-IR generation, though there are a couple of illegal instruction being created, which I believe originate from the TTGIR-level.

@jopperm
Copy link
Contributor

jopperm commented Aug 12, 2024

Update: Still WIP. I'm investigating how to handle "row vectors" (e.g. tensor<1x64>) correctly.

@jopperm
Copy link
Contributor

jopperm commented Aug 19, 2024

WIP.

@jopperm jopperm linked a pull request Aug 20, 2024 that will close this issue
@jopperm
Copy link
Contributor

jopperm commented Aug 26, 2024

Update: PR is up, still working on tests.

@jopperm
Copy link
Contributor

jopperm commented Sep 2, 2024

Update: The linked PRs are under review.

@jopperm
Copy link
Contributor

jopperm commented Sep 6, 2024

Update (for 9/9/24): Two outstanding PRs to land, then I need to check if E2E execution works on llvm-target.

@whitneywhtsang
Copy link
Contributor

Update (for 9/9/24): Two outstanding PRs to land, then I need to check if E2E execution works on llvm-target.

FYI, flash attention is not yet functional on llvm-target, you can apply #2061 when testing.

@jopperm
Copy link
Contributor

jopperm commented Sep 16, 2024

Update: I'm investigating an outstanding issue with the layout propagation at the beginning of the advanced path.

@jopperm
Copy link
Contributor

jopperm commented Sep 23, 2024

Update: Unchanged, still investigating

@jopperm
Copy link
Contributor

jopperm commented Sep 30, 2024

Update: Still working on this.

@jopperm jopperm linked a pull request Oct 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment