Use FusedSDPA for MllamaVisionSdpaAttention #620

kdamaszk · 2024-12-11T12:45:02Z

Use FusedSDPA instead of regular F.scaled_dot_product_attention in MllamaVisionSdpaAttention module.

The difference between these two ops is precision, since F.scaled_dot_product_attention converts the input to float32 and performs all operations on this data type, while FusedSDPA does not. However, it change accuracy only from 0.449 to 0.446 on accuracy test based on MMMU dataset and lm-evaluation-harness, while improves single prompt performance from ~550ms to ~100ms.

This is a updated version from #650. Coupled with [Use FusedSDPA for MllamaVisionSdpaAttention #620], these two issues arising when running llama3.2 vision model can be resolved: GC fail when batchsize>1 on Gaudi3. Increased device memory consumption with Torch 2.5 compared to Torch 2.4. --------- Signed-off-by: yan ma <[email protected]> Co-authored-by: yisonzhu <[email protected]>

Use FusedSDPA for MllamaVisionSdpaAttention

ed5a4d9

kdamaszk requested review from kzawora-intel, madamczykhabana, michalkuligowski and mgawarkiewicz as code owners December 11, 2024 12:45

kdamaszk added 2 commits December 11, 2024 15:06

Merge branch 'habana_main' into dev/kdamaszke/mllama_fusedsdpa

e7107b4

Merge branch 'habana_main' into dev/kdamaszke/mllama_fusedsdpa

8b7bfe4

kdamaszk marked this pull request as draft December 13, 2024 08:48

yisonzhu mentioned this pull request Dec 19, 2024

Add mark_step for encoder layers #650

Closed

kdamaszk added 2 commits December 30, 2024 08:27

Merge branch 'habana_main' into dev/kdamaszke/mllama_fusedsdpa

f10b2bf

Merge branch 'habana_main' into dev/kdamaszke/mllama_fusedsdpa

f307b45

kdamaszk marked this pull request as ready for review January 7, 2025 10:30

kdamaszk requested a review from vivekgoe as a code owner January 7, 2025 10:30

kdamaszk added the habana Issues or PRs submitted by Habana Labs label Jan 7, 2025

yma11 mentioned this pull request Jan 8, 2025

Add mark_step for encoder layers #669

Merged

michalkuligowski approved these changes Jan 8, 2025

View reviewed changes

michalkuligowski merged commit cccf363 into habana_main Jan 8, 2025
13 checks passed

michalkuligowski deleted the dev/kdamaszke/mllama_fusedsdpa branch January 8, 2025 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use FusedSDPA for MllamaVisionSdpaAttention #620

Use FusedSDPA for MllamaVisionSdpaAttention #620

kdamaszk commented Dec 11, 2024 •

edited by github-actions bot

Loading

Use FusedSDPA for MllamaVisionSdpaAttention #620

Use FusedSDPA for MllamaVisionSdpaAttention #620

Conversation

kdamaszk commented Dec 11, 2024 • edited by github-actions bot Loading

kdamaszk commented Dec 11, 2024 •

edited by github-actions bot

Loading