Should I use linear layers for the input and output of FlashAttention? #247

chenhengx0101 · 2024-07-21T09:26:46Z

I'm curious, do I have to use the linear layer respectively first before I input qkv to FlaskAttention? When I get the output from FlaskAttention, do I still need the linear layer?

I look forward to your reply! Thanks!

github-actions · 2024-07-21T09:27:14Z

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

kyegomez · 2024-08-13T23:52:13Z

@chelxu the linear projections are already handled inside of the flash attention. You just pass it the inital token embedding and then after that you can use the mlp and then process that an N number of times.

What are you trying to build!

And, I suggest you join the agora discord for real-time support!

https://discord.gg/agora-999382051935506503

github-actions · 2024-10-13T12:52:32Z

Stale issue message

chenhengx0101 added the bug Something isn't working label Jul 21, 2024

chenhengx0101 assigned kyegomez Jul 21, 2024

github-actions bot added the no-issue-activity label Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should I use linear layers for the input and output of FlashAttention? #247

Should I use linear layers for the input and output of FlashAttention? #247

chenhengx0101 commented Jul 21, 2024 •

edited

Loading

github-actions bot commented Jul 21, 2024

kyegomez commented Aug 13, 2024

github-actions bot commented Oct 13, 2024

Should I use linear layers for the input and output of FlashAttention? #247

Should I use linear layers for the input and output of FlashAttention? #247

Comments

chenhengx0101 commented Jul 21, 2024 • edited Loading

github-actions bot commented Jul 21, 2024

kyegomez commented Aug 13, 2024

github-actions bot commented Oct 13, 2024

chenhengx0101 commented Jul 21, 2024 •

edited

Loading