Adds ATen fallback for scaled_dot_product_attention #21107

prathikr · 2024-06-19T21:10:14Z

Description

Introduces an ATen fallback for torch.nn.functional.scaled_dot_product_attention. This operator was introduced in torch 2.0 and, since then, has had many updates including the implementation of memory efficient attention for V100 machines. The current torchscript exporter exports a subgraph for attention which does not provide the same memory savings that PyTorch's memory efficient attention kernel provides. Allowing fallback to PyTorch ATen op for attention helps mitigate memory spike issues for models leveraging memory efficient attention.

Motivation and Context

Memory issues arose when integrating ONNX Runtime Training with AML Stable Diffusion.

orttraining/orttraining/python/training/ortmodule/_custom_op_symbolic_registry.py

orttraining/orttraining/python/training/ortmodule/_custom_gradient_registry.py

orttraining/orttraining/python/training/ortmodule/_custom_op_symbolic_registry.py

orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py

orttraining/orttraining/python/training/ortmodule/_custom_gradient_registry.py

orttraining/orttraining/python/training/ortmodule/_custom_op_symbolic_registry.py

attn aten fallback

f22b8dc

github-advanced-security bot found potential problems Jun 19, 2024

View reviewed changes

root and others added 18 commits June 20, 2024 22:53

use correct operator names

612e425

formatting

bdcfebb

add unit test

80c3107

formatting

2b29b4c

use pytorch sdpa kernel

d2b8566

bug fix

0ca8fa0

lint

8999ff2

use different kernel

6bf3018

formatting

35bd07a

include Peng's & Vincent's editS

dd1849a

adjust test and comments

8219ec9

move import inside test

65c2cb7

merge with master

b5f5863

feature flag

18648ad

add documentation

be9ce0a

minor fixes

e269e89

doc update

d3cc487

peng fix, xavier suggestion

f500528

prathikr mentioned this pull request Jul 8, 2024

Check require grad before generating gradient output #21229

Closed

prathikr added 8 commits July 8, 2024 20:29

bug fix

c4cdab6

bug fix

c05a5ee

bug fix

f82bd48

adjust unit test

668409b

adjust checks

b5f1169

grad input fix

31becab

handle both with and without bias

5aa147d

full mask

37eb6bc

github-advanced-security bot found potential problems Jul 9, 2024

View reviewed changes

orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py Dismissed Show dismissed Hide dismissed

prathikr requested review from pengwa and centwang July 12, 2024 21:32

prathikr added 5 commits July 12, 2024 21:48

merge with main

ae3b5e7

lint

3484926

add version check for tesT

8d0e879

grad output adjustment

b72a042

add more docs

6b4dd10

pengwa reviewed Jul 17, 2024

View reviewed changes

orttraining/orttraining/python/training/ortmodule/_custom_gradient_registry.py Show resolved Hide resolved

pengwa reviewed Jul 17, 2024

View reviewed changes

orttraining/orttraining/python/training/ortmodule/_custom_op_symbolic_registry.py Show resolved Hide resolved

remove support for masked attention

999b04b

prathikr requested a review from pengwa July 17, 2024 18:37

adjust docs

b1fe489

pengwa approved these changes Jul 19, 2024

View reviewed changes

Merge remote-tracking branch 'origin' into prathikrao/attn-aten-fallback

4ab54e6

prathikr merged commit 11ad299 into main Jul 22, 2024
93 of 98 checks passed

prathikr deleted the prathikrao/attn-aten-fallback branch July 22, 2024 23:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds ATen fallback for scaled_dot_product_attention #21107

Adds ATen fallback for scaled_dot_product_attention #21107

prathikr commented Jun 19, 2024 •

edited

Loading

Adds ATen fallback for scaled_dot_product_attention #21107

Adds ATen fallback for scaled_dot_product_attention #21107

Conversation

prathikr commented Jun 19, 2024 • edited Loading

Description

Motivation and Context

prathikr commented Jun 19, 2024 •

edited

Loading