-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long sequence parallelism (Ulysses) integration with HuggingFace #5774
Conversation
How to enable this function? Is there any document? I updated deepspeed0.15.1. The transformer manually modified some code according to your pr. During startup, the error message "No sequence parallel group found" is displayed. |
@glowwormX - can you please open an issue with your questions? That's more likley to get traction than a comment here. |
I test uly with
|
This PR enhances capabilities of DeepSpeed long sequence (context) parallelism (aka DS Ulysses) with support for HuggingFace (and by extension other frameworks) models. With HF integration, users can use sequence parallelism for model pre/mid/post-training, finetuning etc. Usage requires both torch >=2.2.2 and flash-attention. ZeRO-1 and 2 are supported, ZeRO-3 and SPDA support in progress. Corresponding PR in HF is PR32305.