Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用streaming训练在eval会卡住 #2455

Open
1215thebqtic opened this issue Nov 15, 2024 · 0 comments
Open

使用streaming训练在eval会卡住 #2455

1215thebqtic opened this issue Nov 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@1215thebqtic
Copy link

在数据量比较大的时候使用streaming训练,发现会在eval的时候卡住无法继续下去,不使用streaming但使用lazy tokenize速度会慢几十倍也没办法用,目前把eval strategy设成no就能正常训练了,有时间的话可以帮忙看看吗,谢谢!

下面是会卡住的训练脚本,默认eval_steps=50,所以50steps时会卡住,时间长了可能还会出现gpu oom的情况
swift sft
--model_id_or_path qwen/Qwen2-1.5B-Instruct
--use_flash_attn False
--num_train_epochs 5
--batch_size 2
--save_total_limit -1
--sft_type lora
--dtype fp32
--lazy_tokenize False
--streaming True
--preprocess_num_proc 8
--gradient_accumulation_steps 48
--max_steps 10000
--max_length 512
--truncation_strategy delete

@tastelikefeet tastelikefeet added the bug Something isn't working label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants