fix mha for in the case that present kv is not consumed #21777

guschmue · 2024-08-16T19:24:45Z

No description provided.

js/web/lib/wasm/jsep/webgpu/ops/attention.ts

tianleiwu · 2024-08-16T22:59:42Z

js/web/lib/wasm/jsep/webgpu/ops/attention.ts

+  // since there is no buffer for it.
+  // We check by requesting the output and if not there we'll adjust context.outputCount
+  const presentKeyShape = [
+    parameters.batchSize,


This shape only works for MHA and GQA.
Attention output 1 shape is [2, B, N, T, H] instead of [B, N, T, H], since it concatenates present_key and present_value as present output.

I think here need extra code like

if (attention op) { // can we get operator name from context? Maybe we can use context.outputCount === 2 since MHA and GQA has 3 outputs if present_key are needed. // insert 2 at the beginning of present shape. }

Also need consider another special case for GQA that past and present shares buffers. In that case, the length is max sequence length.

guschmue · 2024-08-19T23:44:13Z

with #21782 this one is no longer needed.

fix mha for in the case that present kv is not consumed

f52b397

guschmue added the ep:WebGPU ort-web webgpu provider label Aug 16, 2024

guschmue added 2 commits August 16, 2024 12:36

make lint happy

4f4849f

make lint happy

23cdcb3

guschmue marked this pull request as ready for review August 16, 2024 20:07

tianleiwu reviewed Aug 16, 2024

View reviewed changes

js/web/lib/wasm/jsep/webgpu/ops/attention.ts Outdated Show resolved Hide resolved

tianleiwu reviewed Aug 16, 2024

View reviewed changes

js/web/lib/wasm/jsep/webgpu/ops/attention.ts Outdated Show resolved Hide resolved

review feedback

a9a2734

tianleiwu reviewed Aug 16, 2024

View reviewed changes

merge main

d6eff9b

guschmue closed this Aug 19, 2024

guschmue deleted the gs/fix-unconsumed-mha-outputs branch September 12, 2024 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix mha for in the case that present kv is not consumed #21777

fix mha for in the case that present kv is not consumed #21777

guschmue commented Aug 16, 2024

tianleiwu Aug 16, 2024 •

edited

Loading

tianleiwu Aug 17, 2024 •

edited

Loading

guschmue commented Aug 19, 2024

fix mha for in the case that present kv is not consumed #21777

fix mha for in the case that present kv is not consumed #21777

Conversation

guschmue commented Aug 16, 2024

tianleiwu Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

tianleiwu Aug 17, 2024 • edited Loading

Choose a reason for hiding this comment

guschmue commented Aug 19, 2024

tianleiwu Aug 16, 2024 •

edited

Loading

tianleiwu Aug 17, 2024 •

edited

Loading