Add packed QKV and rotary embedding within GroupQueryAttention to model builder #245

kunal-vaishnavi · 2024-04-02T00:06:56Z

Description

This PR adds the fusion of rotary embeddings in GroupQueryAttention and combines the Q/K/V MatMuls and Q/K/V Adds into packed QKV MatMuls and packed QKV Adds.

Motivation and Context

The fusion of rotary embeddings in GroupQueryAttention helps improve model performance on large sequence lengths. Combining the individual Q/K/V MatMuls and Q/K/V Adds into packed QKV MatMuls and packed QKV Adds helps improve model performance in general.

kunal-vaishnavi added 14 commits March 9, 2024 02:03

Add fusion of RotaryEmbedding in GroupQueryAttention

076cf20

Remove design reference

28c9635

Merge branch 'main' into kvaishnavi/rotemb-in-gqa

e2b33e8

Support packed MatMul and packed Add

12c55c6

Update model builder README

bb96d49

Remove logger

e4cbae3

Merge branch 'main' into kvaishnavi/rotemb-in-gqa

ec5b7f7

Ignore CSV files

1737fba

Add input names and types

0294d05

Clean up commented out code

c0df659

Remove commented out line

43081bf

Generalize variable name

01fb32f

Merge branch 'main' into kvaishnavi/rotemb-in-gqa

39765f1

Merge branch 'main' into kvaishnavi/rotemb-in-gqa

a6de684

natke merged commit 18adb67 into main Apr 8, 2024
10 of 11 checks passed

natke deleted the kvaishnavi/rotemb-in-gqa branch April 8, 2024 23:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add packed QKV and rotary embedding within GroupQueryAttention to model builder #245

Add packed QKV and rotary embedding within GroupQueryAttention to model builder #245

kunal-vaishnavi commented Apr 2, 2024

Add packed QKV and rotary embedding within GroupQueryAttention to model builder #245

Add packed QKV and rotary embedding within GroupQueryAttention to model builder #245

Conversation

kunal-vaishnavi commented Apr 2, 2024

Description

Motivation and Context