Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the new feature of FPDT (#441) #70

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples_deepspeed/finetune_hf_llama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ The pre-trained weights can be found at [Hugging Face - LLAMA-7B](https://huggin

#### 1. Converting Hugging Face Model Weights to Megatron-Deepspeed Model
```bash
bash examples_deepspeed/finetune_hf_llama/finetune_llama.sh convert
bash examples_deepspeed/finetune_hf_llama/finetune_llama.sh convert_hf2mds
```
This command writes the Hugging Face model weights into the Megatron-Deepspeed model and saves it. You can adjust the parallel configuration in the script.
This command writes the Hugging Face model weights into the Megatron-Deepspeed model and saves it. You can adjust the parallel configuration in the script.```convert_mds2hf``` can convert a Megatron-Deepspeed model into the Hugging Face format

#### 2. Fine-tuning Process
```bash
Expand Down
8 changes: 7 additions & 1 deletion examples_deepspeed/finetune_hf_llama/ds_config.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
{
"train_batch_size" : 256,
"train_micro_batch_size_per_gpu": 16,
"steps_per_print": 1
"steps_per_print": 100,
"zero_optimization": {
"stage": 0
},
"bf16": {
"enabled": true
}
}
5 changes: 5 additions & 0 deletions examples_deepspeed/finetune_hf_llama/ds_config_empty.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"train_batch_size" : 256,
"train_micro_batch_size_per_gpu": 16,
"steps_per_print": 100
}
10 changes: 9 additions & 1 deletion examples_deepspeed/finetune_hf_llama/finetune_llama.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,13 @@ cat <<EOT > $DS_CONFIG
}
EOT

if [ "$1" = "convert_hf2mds" ]; then
DS_CONFIG_PATH="./examples_deepspeed/finetune_hf_llama/ds_config_empty.json"
elif [ "$1" = "convert_mds2hf" ]; then
DS_CONFIG_PATH="./examples_deepspeed/finetune_hf_llama/ds_config_empty.json"
else
DS_CONFIG_PATH="./examples_deepspeed/finetune_hf_llama/ds_config.json"
fi

covert_hf2mds_args="deepspeed tools/hf2megads_weight_converter.py \
--hf-ckpt-num-shards 2 \
Expand All @@ -69,6 +76,7 @@ comm_args="--tensor-model-parallel-size $TP \
--num-layers $NUM_LAYERS \
--hidden-size $HIDDEN_SIZE \
--num-attention-heads $NUM_HEADS \
--finetune \
--ffn-hidden-size $FFN_HIDDEN_SIZE \
--attention-dropout 0 \
--hidden-dropout 0 \
Expand Down Expand Up @@ -97,7 +105,7 @@ comm_args="--tensor-model-parallel-size $TP \
--zero-stage 0 \
--tokenizer-type HFTokenizer \
--tokenizer-model $HF_LLAMA_PATH \
--deepspeed_config ./examples_deepspeed/finetune_hf_llama/ds_config.json \
--deepspeed_config $DS_CONFIG_PATH \
--deepspeed \
--distributed-backend nccl \
--num-workers 0 \
Expand Down
Loading