Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details about Instruction FT #19

Open
Ctrl-C-V4ever opened this issue Jul 10, 2024 · 3 comments
Open

Details about Instruction FT #19

Ctrl-C-V4ever opened this issue Jul 10, 2024 · 3 comments

Comments

@Ctrl-C-V4ever
Copy link

Thanks for opening source this excellent work!
I hope to learn more detail about the instruction finetuning steps.
For example for editting task, are the default hyperparameters in train_seed_x_sft_edit.sh good enough? What is the total batch size? How much computational resources are required?
Thanks a lot for your clarification!

@Ctrl-C-V4ever
Copy link
Author

BTW, I experience a super long waiting time (~10 min) to load the first batch of data. Is this normal? Please let me know. Thanks in advance!

@geyuying
Copy link
Collaborator

Hi, for SFT for editing, we use 16 A100-40G GPUs, with a total batch size 320. Since we do not experiment with different hyperparameters for sft, the default hyperparameters may not be the optimal.

At the beginning of training, it will take a few minutes to load the pre-trained model. If a super long waiting time happens due to loading data, it may not be normal.

@naoto0804
Copy link

Is there any recipe (e.g., learning rate) for doing instruction tuning on a relatively smaller dataset (e.g., Seed-X-PPT?)? I'm trying to train on my custom dataset, but the model quickly overfits to the data in a few hundred iterations;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants