You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your open source project!
The script for the finetune part corresponding to the 1600 pretrain in your provided scripts is different from the configuration given in the appendix of the paper:
1.The total batchsize in 512 (8 batch size * 8 node * 8 GPU)in paper,but 256((2 batch size * 2 num_sample * 8 node * 8 GPU))in script.
2.The training epoch was reduced from 75 rounds in the paper to 35 rounds.
Would it be possible to achieve similar training results with this difference?
The text was updated successfully, but these errors were encountered:
Thank you for your open source project!
The script for the finetune part corresponding to the 1600 pretrain in your provided scripts is different from the configuration given in the appendix of the paper:
1.The total batchsize in 512 (8 batch size * 8 node * 8 GPU)in paper,but 256((2 batch size * 2 num_sample * 8 node * 8 GPU))in script.
2.The training epoch was reduced from 75 rounds in the paper to 35 rounds.
Would it be possible to achieve similar training results with this difference?
The text was updated successfully, but these errors were encountered: