You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for opening source this excellent work!
I hope to learn more detail about the instruction finetuning steps.
For example for editting task, are the default hyperparameters in train_seed_x_sft_edit.sh good enough? What is the total batch size? How much computational resources are required?
Thanks a lot for your clarification!
The text was updated successfully, but these errors were encountered:
Hi, for SFT for editing, we use 16 A100-40G GPUs, with a total batch size 320. Since we do not experiment with different hyperparameters for sft, the default hyperparameters may not be the optimal.
At the beginning of training, it will take a few minutes to load the pre-trained model. If a super long waiting time happens due to loading data, it may not be normal.
Is there any recipe (e.g., learning rate) for doing instruction tuning on a relatively smaller dataset (e.g., Seed-X-PPT?)? I'm trying to train on my custom dataset, but the model quickly overfits to the data in a few hundred iterations;
Thanks for opening source this excellent work!
I hope to learn more detail about the instruction finetuning steps.
For example for editting task, are the default hyperparameters in
train_seed_x_sft_edit.sh
good enough? What is the total batch size? How much computational resources are required?Thanks a lot for your clarification!
The text was updated successfully, but these errors were encountered: