Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OptimizedLinear updates #5791

Merged
merged 20 commits into from
Aug 14, 2024
Merged

OptimizedLinear updates #5791

merged 20 commits into from
Aug 14, 2024

Conversation

jeffra
Copy link
Collaborator

@jeffra jeffra commented Jul 23, 2024

This is a refresh of of OptimizedLinear with the following features to improve performance and usability:

  • More efficient sharing of base weights using all_gather_into_tensor
  • Flattened sharded weights
  • Selectively offload frozen weights to cpu
  • deepspeed.linear.Init that allows injecting OptimizedLinear during model construction (similar to zero.Init)
  • Support for load state dict directly in OptimizedLinear, this allows loading HF model weights correctly into sharded params
  • Various bug fixes for the LoRA implementation introduced previously
  • Several new unit tests

Builds on-top of @RezaYazdaniAminabadi's previous FP8 updates (#5764) to support dense model fp8 quantization.

Example usage of this to fine-tune llama-3.1-405B on a single node: https://github.com/Snowflake-Labs/snowflake-arctic/tree/main/training/llama3.1

@jeffra
Copy link
Collaborator Author

jeffra commented Aug 10, 2024

nv-accelerate-v100 results on H100 (torch 2.4 + cu12.2):
image

nv-torch-latest-v100 results on H100 (torch 2.4 + cu12.2):

pytest --forked -n 4 unit/ --torch_ver="2.4" --cuda_ver="12.1" &> run1.log
pytest --forked -m 'sequential' unit/ --torch_ver="2.4" --cuda_ver="12.1" &> run2.log

image

@jeffra
Copy link
Collaborator Author

jeffra commented Aug 13, 2024

I'm able to get both the nv-accelerate-v100 and nv-torch-latest-v100 workflows to pass with this branch on my local H100 node (see previous comment). /cc @tjruwase @HeyangQin @loadams. Okay to force merge?

@loadams
Copy link
Contributor

loadams commented Aug 13, 2024

I'm able to get both the nv-accelerate-v100 and nv-torch-latest-v100 workflows to pass with this branch on my local H100 node (see previous comment). /cc @tjruwase @HeyangQin @loadams. Okay to force merge?

I believe I've fixed our runners @jeffra - I'll monitor it today to be sure it gets merged.

@loadams loadams enabled auto-merge August 13, 2024 23:04
@loadams loadams added this pull request to the merge queue Aug 13, 2024
Merged via the queue into microsoft:master with commit 6e5d58d Aug 14, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants