Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在使用Deepspeed的zero stage2训练visualglm-6B是,出现最终权重为25G的现象 #182

Open
corkiyao opened this issue Sep 9, 2024 · 0 comments

Comments

@corkiyao
Copy link

corkiyao commented Sep 9, 2024

gpt_options=" \
       --experiment-name finetune-$MODEL_TYPE \
       --model-parallel-size ${MP_SIZE} \
       --mode finetune \
       --train-iters 100 \
       --resume-dataloader \
       $MODEL_ARGS \
       --train-data ${train_data} \
       --valid-data ${eval_data} \
       --distributed-backend nccl \
       --lr-decay-style cosine \
       --warmup .02 \
       --checkpoint-activations \
       --save-interval 1200 \
       --eval-interval 10000 \
       --save "./checkpoints" \
       --split 1 \
       --eval-iters 10 \
       --eval-batch-size 1 \
       --zero-stage 2 \
       --lr 0.0001 \
       --batch-size 1 \
       --skip-init \
       --fp16 \
       --use_lora
"

如上,这是在stage2训练,使用finetune_visualglm.py和lora微调,其他程序未改动。但是结果得到的权重达到25G,远超官方公布的权重。想请教下,这是啥情况呢?不使用stage2而使用stage1训练时也才14.5G左右。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant