Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should I use linear layers for the input and output of FlashAttention? #247

Open
chenhengx0101 opened this issue Jul 21, 2024 · 3 comments
Assignees
Labels
bug Something isn't working no-issue-activity

Comments

@chenhengx0101
Copy link

chenhengx0101 commented Jul 21, 2024

I'm curious, do I have to use the linear layer respectively first before I input qkv to FlaskAttention? When I get the output from FlaskAttention, do I still need the linear layer?

I look forward to your reply! Thanks!

@chenhengx0101 chenhengx0101 added the bug Something isn't working label Jul 21, 2024
Copy link

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

@kyegomez
Copy link
Owner

@chelxu the linear projections are already handled inside of the flash attention. You just pass it the inital token embedding and then after that you can use the mlp and then process that an N number of times.

What are you trying to build!

And, I suggest you join the agora discord for real-time support!

https://discord.gg/agora-999382051935506503

Copy link

Stale issue message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working no-issue-activity
Projects
None yet
Development

No branches or pull requests

2 participants