-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#6 Attention Performance] extend attention support for Causal = True #1102
Comments
Update: I'm actively working on this. What I've found out so far is that activating causal masking affects the propagation of layout information in the kernel function. There are now two dependent loops, and the current block's indices are evaluated in the computation, so we need to support additional operations ( |
Update: Still WIP. Locally, I have extended the |
Update: Still WIP. I'm investigating how to handle "row vectors" (e.g. |
WIP. |
Update: PR is up, still working on tests. |
Update: The linked PRs are under review. |
Update (for 9/9/24): Two outstanding PRs to land, then I need to check if E2E execution works on |
FYI, flash attention is not yet functional on |
Update: I'm investigating an outstanding issue with the layout propagation at the beginning of the advanced path. |
Update: Unchanged, still investigating |
Update: Still working on this. |
No description provided.
The text was updated successfully, but these errors were encountered: