Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FusedSDPA for MllamaVisionSdpaAttention #620

Merged
merged 5 commits into from
Jan 8, 2025

Conversation

kdamaszk
Copy link

@kdamaszk kdamaszk commented Dec 11, 2024

Use FusedSDPA instead of regular F.scaled_dot_product_attention in MllamaVisionSdpaAttention module.

The difference between these two ops is precision, since F.scaled_dot_product_attention converts the input to float32 and performs all operations on this data type, while FusedSDPA does not. However, it change accuracy only from 0.449 to 0.446 on accuracy test based on MMMU dataset and lm-evaluation-harness, while improves single prompt performance from ~550ms to ~100ms.

@kdamaszk kdamaszk marked this pull request as draft December 13, 2024 08:48
@kdamaszk kdamaszk marked this pull request as ready for review January 7, 2025 10:30
@kdamaszk kdamaszk requested a review from vivekgoe as a code owner January 7, 2025 10:30
@kdamaszk kdamaszk added the habana Issues or PRs submitted by Habana Labs label Jan 7, 2025
michalkuligowski pushed a commit that referenced this pull request Jan 8, 2025
This is a updated version from
#650.


Coupled with [Use FusedSDPA for MllamaVisionSdpaAttention
#620], these two issues
arising when running llama3.2 vision model can be resolved:

GC fail when batchsize>1 on Gaudi3.
Increased device memory consumption with Torch 2.5 compared to Torch
2.4.

---------

Signed-off-by: yan ma <[email protected]>
Co-authored-by: yisonzhu <[email protected]>
@michalkuligowski michalkuligowski merged commit cccf363 into habana_main Jan 8, 2025
13 checks passed
@michalkuligowski michalkuligowski deleted the dev/kdamaszke/mllama_fusedsdpa branch January 8, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants