add axiswise granularity to Float8Tensor #919

vkuzo · 2024-09-23T16:52:26Z

Summary:

This is a copy-paste of pytorch-labs/float8_experimental#352
which never landed.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2024-09-23T16:52:27Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2024-09-23T16:52:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/919

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1f01df9 with merge base 5dd0132 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

torchao/float8/float8_tensor.py

test/float8/test_base.py

torchao/float8/float8_tensor.py

[ghstack-poisoned]

lw

I skimmed through it, LGTM!

I have one suggestion to have a more streamlined and extendable API, see below

lw · 2024-10-03T09:55:14Z

test/float8/test_base.py

+            scaling_granularity=ScalingGranularity.AXISWISE,
+            axiswise_dim=axiswise_dim,


Suggestion for another API: instead of an enum + extra params on a case-by-case basis, we could reuse the same idea that @drisspg used in the _scaled_mm operator: deduce the kind of scaling based on the size/shape of the desired scale tensor!

Concretely, we could add a single scale_shape=... parameter, which for row-wise would be [-1, 1], indicating that:

all columns (second dim) should be grouped and reduced into a single scaling factor (because the second element has a value of 1)

but that for the rows (first dim) there should be as many scaling factors as there are rows (because the first element has a value of -1, which gets replaced with the dim of the input tensor).

The scale shape is right-aligned to the shape of the tensor (thus following PyTorch's standard broadcast semantics), and then left-padded with 1 (again, standard semantics). This means that tensor-wise scaling is achieved with a scale_size=[].

Using this convention will later allow to express block-wise scaling (e.g., 128x128), group-wise scaling (1x128) and maybe even column-wise scaling if that ever becomes a thing!

One wrinkle to work through would be that Float8Tensor can be of any rank, but operand inputs to torch._scaled_mm are required to be of rank 2, to match torch.mm|torch.addmm.

I'm definitely open to making this more flexible in the future. We've been careful to keep Float8Tensor and these utility functions out of the public API, to give us the freedom to make these kinds of changes as other scaling types become more important.

also, if someone puts up a PR for ^, sgtm!

[ghstack-poisoned]

vkuzo · 2024-10-04T16:51:07Z

torchao/float8/float8_linear.py

@@ -191,15 +188,6 @@ def __init__(self, *args, **kwargs):
        # would be initialized in every iteration.
        self.enable_pre_and_post_forward = self.config.enable_pre_and_post_forward

-        # See the comments in config.py for more details of this option.


technically not related to this PR, but making the test logs non-spammy for now and we can add this back in a better way later

Summary: This is a copy-paste of pytorch-labs/float8_experimental#352 which never landed. Test Plan: Reviewers: Subscribers: Tasks: Tags:

* Use ao's int4 quantizer * Point AO to commit hash of Jerry's fix * When device is cuda, only run for dtype==bfloat16 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Typo Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use tensor subclass for int4 weight only quant * Fix bug * Fix * Use both quantizer and subclass API * Bug * unwrap tensor subclass for aoti * Add import * Eval fix * Evaluate AOTI --------- Co-authored-by: Mengwei Liu <[email protected]>

Update

183dbec

[ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 23, 2024

vkuzo mentioned this pull request Sep 23, 2024

add axiswise scaling to Float8Linear #920

Merged

vkuzo added 3 commits September 23, 2024 09:54

Update

f15c2a0

[ghstack-poisoned]

Update

9150b4f

[ghstack-poisoned]

Update

459e92c

[ghstack-poisoned]

vkuzo mentioned this pull request Sep 24, 2024

float8 training axiswise scaling support with per-gemm-argument configuration #940

Merged

Update

c5d19e0

[ghstack-poisoned]

vkuzo requested a review from drisspg October 1, 2024 01:30

drisspg reviewed Oct 1, 2024

View reviewed changes

torchao/float8/float8_tensor.py Show resolved Hide resolved

drisspg reviewed Oct 1, 2024

View reviewed changes

test/float8/test_base.py Outdated Show resolved Hide resolved

drisspg approved these changes Oct 1, 2024

View reviewed changes

vkuzo commented Oct 1, 2024

View reviewed changes

torchao/float8/float8_tensor.py Outdated Show resolved Hide resolved

vkuzo added 3 commits October 2, 2024 08:29

Update

0c473c4

[ghstack-poisoned]

Update

fc8d4ef

[ghstack-poisoned]

Update

ac6f768

[ghstack-poisoned]

lw approved these changes Oct 3, 2024

View reviewed changes

Update

1f01df9

[ghstack-poisoned]

vkuzo commented Oct 4, 2024

View reviewed changes

vkuzo merged commit 52d27a1 into main Oct 7, 2024
43 checks passed

jainapurva pushed a commit that referenced this pull request Oct 15, 2024

add axiswise granularity to Float8Tensor (#919)

92feafa

Summary: This is a copy-paste of pytorch-labs/float8_experimental#352 which never landed. Test Plan: Reviewers: Subscribers: Tasks: Tags:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add axiswise granularity to Float8Tensor #919

add axiswise granularity to Float8Tensor #919

vkuzo commented Sep 23, 2024

vkuzo commented Sep 23, 2024 •

edited

Loading

pytorch-bot bot commented Sep 23, 2024 •

edited

Loading

lw left a comment

lw Oct 3, 2024

vkuzo Oct 4, 2024

vkuzo Oct 4, 2024

vkuzo Oct 4, 2024

		scaling_granularity=ScalingGranularity.AXISWISE,
		axiswise_dim=axiswise_dim,

add axiswise granularity to Float8Tensor #919

add axiswise granularity to Float8Tensor #919

Conversation

vkuzo commented Sep 23, 2024

vkuzo commented Sep 23, 2024 • edited Loading

pytorch-bot bot commented Sep 23, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/919

✅ No Failures

lw left a comment

Choose a reason for hiding this comment

lw Oct 3, 2024

Choose a reason for hiding this comment

vkuzo Oct 4, 2024

Choose a reason for hiding this comment

vkuzo Oct 4, 2024

Choose a reason for hiding this comment

vkuzo Oct 4, 2024

Choose a reason for hiding this comment

vkuzo commented Sep 23, 2024 •

edited

Loading

pytorch-bot bot commented Sep 23, 2024 •

edited

Loading