We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在数据量比较大的时候使用streaming训练,发现会在eval的时候卡住无法继续下去,不使用streaming但使用lazy tokenize速度会慢几十倍也没办法用,目前把eval strategy设成no就能正常训练了,有时间的话可以帮忙看看吗,谢谢!
下面是会卡住的训练脚本,默认eval_steps=50,所以50steps时会卡住,时间长了可能还会出现gpu oom的情况 swift sft --model_id_or_path qwen/Qwen2-1.5B-Instruct --use_flash_attn False --num_train_epochs 5 --batch_size 2 --save_total_limit -1 --sft_type lora --dtype fp32 --lazy_tokenize False --streaming True --preprocess_num_proc 8 --gradient_accumulation_steps 48 --max_steps 10000 --max_length 512 --truncation_strategy delete
The text was updated successfully, but these errors were encountered:
No branches or pull requests
在数据量比较大的时候使用streaming训练,发现会在eval的时候卡住无法继续下去,不使用streaming但使用lazy tokenize速度会慢几十倍也没办法用,目前把eval strategy设成no就能正常训练了,有时间的话可以帮忙看看吗,谢谢!
下面是会卡住的训练脚本,默认eval_steps=50,所以50steps时会卡住,时间长了可能还会出现gpu oom的情况
swift sft
--model_id_or_path qwen/Qwen2-1.5B-Instruct
--use_flash_attn False
--num_train_epochs 5
--batch_size 2
--save_total_limit -1
--sft_type lora
--dtype fp32
--lazy_tokenize False
--streaming True
--preprocess_num_proc 8
--gradient_accumulation_steps 48
--max_steps 10000
--max_length 512
--truncation_strategy delete
The text was updated successfully, but these errors were encountered: