Details about Instruction FT #19

Ctrl-C-V4ever · 2024-07-10T17:35:02Z

Thanks for opening source this excellent work!
I hope to learn more detail about the instruction finetuning steps.
For example for editting task, are the default hyperparameters in train_seed_x_sft_edit.sh good enough? What is the total batch size? How much computational resources are required?
Thanks a lot for your clarification!

The text was updated successfully, but these errors were encountered:

Ctrl-C-V4ever · 2024-07-12T00:16:02Z

BTW, I experience a super long waiting time (~10 min) to load the first batch of data. Is this normal? Please let me know. Thanks in advance!

geyuying · 2024-07-21T05:27:36Z

Hi, for SFT for editing, we use 16 A100-40G GPUs, with a total batch size 320. Since we do not experiment with different hyperparameters for sft, the default hyperparameters may not be the optimal.

At the beginning of training, it will take a few minutes to load the pre-trained model. If a super long waiting time happens due to loading data, it may not be normal.

naoto0804 · 2024-07-26T14:07:50Z

Is there any recipe (e.g., learning rate) for doing instruction tuning on a relatively smaller dataset (e.g., Seed-X-PPT?)? I'm trying to train on my custom dataset, but the model quickly overfits to the data in a few hundred iterations;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details about Instruction FT #19

Details about Instruction FT #19

Ctrl-C-V4ever commented Jul 10, 2024

Ctrl-C-V4ever commented Jul 12, 2024

geyuying commented Jul 21, 2024

naoto0804 commented Jul 26, 2024

Details about Instruction FT #19

Details about Instruction FT #19

Comments

Ctrl-C-V4ever commented Jul 10, 2024

Ctrl-C-V4ever commented Jul 12, 2024

geyuying commented Jul 21, 2024

naoto0804 commented Jul 26, 2024