-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[example] updated hybrid model parallel GPT pretraining examples
- Loading branch information
1 parent
ef4b99e
commit 42674bd
Showing
9 changed files
with
2,851 additions
and
0 deletions.
There are no files selected for viewing
20 changes: 20 additions & 0 deletions
20
examples/language/gpt/experiments/hybrid_parallel/Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
FROM nvcr.io/nvidia/pytorch:22.12-py3 | ||
|
||
WORKDIR /workspace | ||
|
||
RUN pip install -U --no-cache-dir torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 torchtext torchdata --index-url https://download.pytorch.org/whl/cu118 | ||
|
||
RUN pip install -U --no-cache-dir transformers datasets | ||
|
||
RUN pip uninstall -y apex && git clone https://github.com/NVIDIA/apex.git && cd apex && \ | ||
python setup.py install --cpp_ext --cuda_ext --fast_layer_norm --fmha --xentropy --fast_multihead_attn | ||
|
||
RUN pip install -U --no-cache-dir ninja && \ | ||
pip install -v -U --no-cache-dir git+https://github.com/facebookresearch/xformers.git@main#egg=xformers | ||
|
||
RUN git clone https://github.com/Dao-AILab/flash-attention.git && \ | ||
cd flash-attention && python setup.py install && \ | ||
cd csrc/rotary && python setup.py install | ||
|
||
RUN git clone https://github.com/kurisusnowdeng/ColossalAI.git && cd ColossalAI && \ | ||
CUDA_EXT=1 pip install -U -v --no-cache-dir -e . |
90 changes: 90 additions & 0 deletions
90
examples/language/gpt/experiments/hybrid_parallel/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# GPT2 benchmark | ||
|
||
## Preparation | ||
|
||
### Dependencies | ||
|
||
Install apex | ||
```shell | ||
git clone https://github.com/NVIDIA/apex | ||
cd apex | ||
pip install -v -U --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" . | ||
``` | ||
|
||
Install xformers | ||
```shell | ||
pip install ninja | ||
git clone https://github.com/facebookresearch/xformers.git | ||
cd xformers | ||
git submodule update --init --recursive | ||
pip install -v -U . | ||
``` | ||
|
||
Install bitsandbytes (e.g. CUDA 11.8) | ||
```shell | ||
git clone https://github.com/timdettmers/bitsandbytes.git | ||
cd bitsandbytes | ||
CUDA_VERSION=118 make cuda11x | ||
python setup.py install | ||
``` | ||
|
||
Install ColossalAI | ||
```shell | ||
git clone https://github.com/hpcaitech/ColossalAI.git | ||
cd ColossalAI | ||
CUDA_EXT=1 pip install -v -U . | ||
``` | ||
|
||
### Dataset | ||
|
||
```shell | ||
pip install -U transformers datasets | ||
python process_data.py --output-path /PATH/TO/PROCESSED/OPENWEBTEXT | ||
``` | ||
|
||
## Usage | ||
|
||
### PyTorch FSDP | ||
|
||
```shell | ||
OMP_NUM_THREADS=128 torchrun --nproc_per_node 8 --master_port 23333 train_torch.py \ | ||
--data-path /PATH/TO/PROCESSED/OPENWEBTEXT \ | ||
--model gpt2-10b \ | ||
--max-iters 10 --eval-iters 1 --warmup-iters 0 \ | ||
--batch-size 4 --global-batch-size 128 \ | ||
--optim AdamW \ | ||
--dtype float16 \ | ||
--recompute \ | ||
--zero-stage 3 | ||
``` | ||
|
||
### ColossalAI Gemini | ||
|
||
```shell | ||
OMP_NUM_THREADS=128 torchrun --nproc_per_node 8 --master_port 23333 train_gemini.py \ | ||
--data-path /PATH/TO/PROCESSED/OPENWEBTEXT \ | ||
--model gpt2-10b \ | ||
--max-iters 10 --eval-iters 1 --warmup-iters 0 \ | ||
--batch-size 4 \ | ||
--optim AdamW \ | ||
--dtype float16 \ | ||
--recompute \ | ||
--flash \ | ||
--zero-stage 3 | ||
``` | ||
|
||
### ColossalAI Tensor Parallelism | ||
|
||
```shell | ||
OMP_NUM_THREADS=128 torchrun --nproc_per_node 8 --master_port 23333 train_col.py \ | ||
--data-path /PATH/TO/PROCESSED/OPENWEBTEXT \ | ||
--model gpt2-10b \ | ||
--max-iters 10 --eval-iters 1 --warmup-iters 0 \ | ||
--batch-size 4 --global-batch-size 128 \ | ||
--optim AdamW \ | ||
--dtype float16 --amp-level 2 \ | ||
--recompute \ | ||
--flash \ | ||
--tp 1d --tp-size 4 \ | ||
--zero-stage 3 | ||
``` |
Oops, something went wrong.