Not Seeing much memory savings with Fp8 optimizer suddenly #1499

asahni04 · 2025-01-06T02:27:28Z

Not Seeing much memory savings with Fp8 optimizer suddenly tried it on Torchtitan Llama 13B

gau-nernst · 2025-01-07T15:58:56Z

Do you have a snippet to reproduce the issue? Also, what is your PyTorch and torchao version.

asahni04 · 2025-01-07T19:11:28Z

hi @gau-nernst i tried out with torchtitan repo, i just launched the training with Llama 13B and 8B FP8 adamw block_size 128 on H100. i see no memory savings at all: https://github.com/pytorch/torchtitan/blob/main/train_configs/llama3_8b.toml using TP + DP on single node. TP rank 8 DP rank =1.

gau-nernst · 2025-01-08T03:08:23Z

I will try to reproduce. Btw, if you switch to AdamW8bit or AdamW4bit, do you observe memory saving?

gau-nernst · 2025-01-08T04:40:17Z

I can't reproduce the issue. On an 2xH100 machine from vast.ai, using

data_parallel_replicate_degree = 1
data_parallel_shard_degree = -1
tensor_parallel_degree = 2

set NGPU=2

Changes for AdamWFp8

            # if name == "Adam":
            #     # TODO: make the optimizer options configurable by toml/cmd args
            #     optimizer = torch.optim.Adam(model.parameters(), **optimizer_kwargs)
            # elif name == "AdamW":
            #     optimizer = torch.optim.AdamW(model.parameters(), **optimizer_kwargs)
            # else:
            #     raise NotImplementedError(f"Optimizer {name} not added.")

            from torchao.prototype.low_bit_optim import AdamWFp8

            optimizer_kwargs.pop("fused", None)
            optimizer_kwargs.pop("foreach", None)
            optimizer = AdamWFp8(model.parameters(), **optimizer_kwargs)

            self.optimizers.append(optimizer)

torch==2.7.0.dev20250107+cu126, torchtitan commit 90567fc98

Without TP, I observed that due to selective activation checkpointing policy, memory consumption might be similar but AdamWFp8 has faster end2end, since there are fewer recomputed activation. You might want to set activation checkpointing to "full" to make sure the comparison is fair.

asahni04 · 2025-01-09T06:33:18Z

@gau-nernst hi what version of ao do you use?

gau-nernst · 2025-01-09T06:34:42Z

Latest stable 0.7

asahni04 · 2025-01-09T06:36:02Z

also which config did you use? did you use bf16 training?

gau-nernst · 2025-01-09T06:41:04Z

Default Llama 8B config. All the changes I have mentioned in my previous reply.

asahni04 · 2025-01-09T08:03:19Z

very weird, but i do notice savings on torchtitan but not on my own modified model

gau-nernst added optimizer reproduction needed labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not Seeing much memory savings with Fp8 optimizer suddenly #1499

Not Seeing much memory savings with Fp8 optimizer suddenly #1499

asahni04 commented Jan 6, 2025

gau-nernst commented Jan 7, 2025

asahni04 commented Jan 7, 2025 •

edited

Loading

gau-nernst commented Jan 8, 2025

gau-nernst commented Jan 8, 2025

asahni04 commented Jan 9, 2025

gau-nernst commented Jan 9, 2025

asahni04 commented Jan 9, 2025

gau-nernst commented Jan 9, 2025

asahni04 commented Jan 9, 2025

Not Seeing much memory savings with Fp8 optimizer suddenly #1499

Not Seeing much memory savings with Fp8 optimizer suddenly #1499

Comments

asahni04 commented Jan 6, 2025

gau-nernst commented Jan 7, 2025

asahni04 commented Jan 7, 2025 • edited Loading

gau-nernst commented Jan 8, 2025

gau-nernst commented Jan 8, 2025

asahni04 commented Jan 9, 2025

gau-nernst commented Jan 9, 2025

asahni04 commented Jan 9, 2025

gau-nernst commented Jan 9, 2025

asahni04 commented Jan 9, 2025

asahni04 commented Jan 7, 2025 •

edited

Loading