The Batch size and training epoch not metch with paper #103

Sumutan · 2023-05-29T12:23:30Z

Thank you for your open source project！
The script for the finetune part corresponding to the 1600 pretrain in your provided scripts is different from the configuration given in the appendix of the paper:
1.The total batchsize in 512 (8 batch size * 8 node * 8 GPU)in paper，but 256((2 batch size * 2 num_sample * 8 node * 8 GPU))in script.
2.The training epoch was reduced from 75 rounds in the paper to 35 rounds.
Would it be possible to achieve similar training results with this difference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Batch size and training epoch not metch with paper #103

The Batch size and training epoch not metch with paper #103

Sumutan commented May 29, 2023

The Batch size and training epoch not metch with paper #103

The Batch size and training epoch not metch with paper #103

Comments

Sumutan commented May 29, 2023