Add DDPWrapper #2479

juliagmt-google · 2024-09-30T19:12:27Z

Benchmark improvements: #2468

Summary: This reverts commit 7743149b2be4a9eba7e0997ccdc6abe552bec266. Reverts * pytorch/pytorch#135503 * pytorch/pytorch#135502 * pytorch/pytorch#135422 This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation. ``` import torch from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention from torch._inductor.lowering import make_pointwise, register_lowering from torch._inductor.virtualized import ops from torch.nn.attention.flex_attention import create_block_mask torch.set_default_device('cuda') flex_attention = torch.compile(flex_attention, dynamic=False) prefix_lengths = torch.arange(8) def prefix_lm(b, h, q, kv): return prefix_lengths[b] >= kv mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True) ``` X-link: pytorch/pytorch#136590 Approved by: https://github.com/Chillee Reviewed By: atalman Differential Revision: D63431470 Pulled By: anijain2305 fbshipit-source-id: 60915b30336121b845af71f423582c22a6c65c3f

Summary: Add new metric `--metric nsys` to collect nsys trace. Reviewed By: htyu Differential Revision: D63274918 fbshipit-source-id: 0536310df6290ea5f5a02d85cc0ad6d342d45dbd

Summary: pytorch#2458 Pull Request resolved: pytorch#2459 Reviewed By: xuzhao9 Differential Revision: D63476542 Pulled By: kit1980 fbshipit-source-id: 01e9db9cb03d34e82a773897417df2ccda410634

Summary: Pull Request resolved: pytorch#2473 Reviewed By: xuzhao9 Differential Revision: D63543625 Pulled By: bertmaher fbshipit-source-id: 1693e15875544bda0f5f6c69daa5597fffd80509

juliagmt-google

test

Summary: Pull Request resolved: pytorch#2475 Reviewed By: htyu Differential Revision: D63653081 Pulled By: xuzhao9 fbshipit-source-id: 8d840986779b6124cbccc2425c24e2b892d55ce4

Summary: We had the imports wrong for the internal port. Reviewed By: xuzhao9, adamomainz Differential Revision: D63643617 fbshipit-source-id: 04a49d419fede71d2681dedbfb55112a67cb4d55

Summary: We have an old triton internally that doesn't have the cublasLt bindings Reviewed By: adamomainz Differential Revision: D63643619 fbshipit-source-id: 39aece74b52f7747fe2100d7bb905bad49ba1fa0

Summary: X-link: facebookresearch/FBGEMM#301 X-link: pytorch/FBGEMM#3202 Printing warnings to stdout mucks up the output of various tools/benchmarks Reviewed By: xuzhao9, htyu Differential Revision: D63643615 fbshipit-source-id: 1f34508a7fd36f5aa421e11bddd5ce77fc13038a

Summary: FBGEMM has changed how it declares its Cutlass-based blockwise gemm. Reviewed By: htyu, sijiac, adamomainz Differential Revision: D63643618 fbshipit-source-id: e46e3bbd2e07be0653f7c7fa6bd080b6c8db171e

Summary: We have a big list of interesting shapes for blockwise/rowwise scaled gemm. A lot of these are variants of llama. We might want to use them for gemm and fp8_gemm (unscaled) as well, but for now just do blockwise/rowwise Reviewed By: xuzhao9, adamomainz Differential Revision: D63643616 fbshipit-source-id: 328961fe8c91e66428fcd1e5b72c89813f58a5a3

Summary: We were only benchmarking `row-major x row-major` gemms (also called `TT` or `transpose-transpose`, because FORTRAN), which is actually not the common case; `nn.Linear` will use column-major layouts for weights, which means `TN` is actually much more common. Reviewed By: adamomainz Differential Revision: D63714661 fbshipit-source-id: 735c25c59ddeb6596afd9b19f463af92036a830b

Update epoch from 10 to 3

Update run_release_test.sh

anijain2305 and others added 4 commits September 26, 2024 00:50

Add nsys integration

2edf80c

Summary: Add new metric `--metric nsys` to collect nsys trace. Reviewed By: htyu Differential Revision: D63274918 fbshipit-source-id: 0536310df6290ea5f5a02d85cc0ad6d342d45dbd

Fix bug pytorch#2458 (pytorch#2459)

0f05015

Summary: pytorch#2458 Pull Request resolved: pytorch#2459 Reviewed By: xuzhao9 Differential Revision: D63476542 Pulled By: kit1980 fbshipit-source-id: 01e9db9cb03d34e82a773897417df2ccda410634

Restore FlexAttention and FlashV3 backward (pytorch#2473)

611bf70

Summary: Pull Request resolved: pytorch#2473 Reviewed By: xuzhao9 Differential Revision: D63543625 Pulled By: bertmaher fbshipit-source-id: 1693e15875544bda0f5f6c69daa5597fffd80509

facebook-github-bot added the cla signed label Sep 30, 2024

juliagmt-google had a problem deploying to docker-s3-upload September 30, 2024 19:12 — with GitHub Actions Failure

juliagmt-google temporarily deployed to docker-s3-upload September 30, 2024 19:12 — with GitHub Actions Inactive

juliagmt-google commented Sep 30, 2024

View reviewed changes

Fix hardcoded shape in low_mem_dropout benchmark (pytorch#2475)

252a3b1

Summary: Pull Request resolved: pytorch#2475 Reviewed By: htyu Differential Revision: D63653081 Pulled By: xuzhao9 fbshipit-source-id: 8d840986779b6124cbccc2425c24e2b892d55ce4

juliagmt-google force-pushed the main branch from 2e8ea36 to 252a3b1 Compare October 1, 2024 17:33

juliagmt-google had a problem deploying to docker-s3-upload October 1, 2024 17:33 — with GitHub Actions Error

juliagmt-google had a problem deploying to docker-s3-upload October 1, 2024 17:36 — with GitHub Actions Failure

juliagmt-google temporarily deployed to docker-s3-upload October 1, 2024 17:37 — with GitHub Actions Inactive

bertmaher added 2 commits October 1, 2024 10:37

Make FA3 work in fbcode

b6b67a4

Summary: We had the imports wrong for the internal port. Reviewed By: xuzhao9, adamomainz Differential Revision: D63643617 fbshipit-source-id: 04a49d419fede71d2681dedbfb55112a67cb4d55

Skip loading triton.nvidia.cublas if not found

0611c41

Summary: We have an old triton internally that doesn't have the cublasLt bindings Reviewed By: adamomainz Differential Revision: D63643619 fbshipit-source-id: 39aece74b52f7747fe2100d7bb905bad49ba1fa0

juliagmt-google force-pushed the main branch from fbf2498 to 0611c41 Compare October 1, 2024 19:55

juliagmt-google had a problem deploying to docker-s3-upload October 1, 2024 19:56 — with GitHub Actions Failure

juliagmt-google temporarily deployed to docker-s3-upload October 1, 2024 19:56 — with GitHub Actions Inactive

bertmaher added 4 commits October 1, 2024 14:49

Modernize cutlass call for fp8 blockwise

2d9ab0b

Summary: FBGEMM has changed how it declares its Cutlass-based blockwise gemm. Reviewed By: htyu, sijiac, adamomainz Differential Revision: D63643618 fbshipit-source-id: e46e3bbd2e07be0653f7c7fa6bd080b6c8db171e

juliagmt-google had a problem deploying to docker-s3-upload October 1, 2024 22:17 — with GitHub Actions Error

juliagmt-google added 2 commits October 1, 2024 15:18

Update run_release_test.sh

1d214e5

Update epoch from 10 to 3

Merge pull request #1 from juliagmt-google/juliagmt-google-patch-1

e309f1b

Update run_release_test.sh

juliagmt-google had a problem deploying to docker-s3-upload October 1, 2024 22:19 — with GitHub Actions Failure

juliagmt-google temporarily deployed to docker-s3-upload October 1, 2024 22:19 — with GitHub Actions Inactive

juliagmt-google had a problem deploying to docker-s3-upload October 1, 2024 22:21 — with GitHub Actions Failure

juliagmt-google temporarily deployed to docker-s3-upload October 1, 2024 22:21 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DDPWrapper #2479

Add DDPWrapper #2479

juliagmt-google commented Sep 30, 2024

juliagmt-google left a comment

Add DDPWrapper #2479

Are you sure you want to change the base?

Add DDPWrapper #2479

Conversation

juliagmt-google commented Sep 30, 2024

juliagmt-google left a comment

Choose a reason for hiding this comment