-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What if reducing the batch size? #82
Comments
Hi @ywen666, |
Hi thanks for the suggestion! I also share a question with the other issue where the evaluation is slow. Running python seq2seq/run_seq2seq.py configs/eval.json takes 9 hours to complete on a 4-gpu (rtx 5000) node, 200s per iteration, which seems too slow. If I disable picard, runnning python seq2seq/run_seq2seq.py configs/nopicard_eval.json, it is pretty fast, finished in 8mins, 1.6s per iteration. I wonder what is the best way to check which part bottlenecks the inference speed? |
There were some changes recently to the parser that may have resulted in a performance regression. I suspect that this is the cause the slowdown. When I have the time, I’ll look into this. |
You could help me out by telling me which input-output pairs take the longest to generate. |
I am looking into this but it will take some time for me to figure it out. |
Hi, I am trying to time each example generation time. I found the generate method wrapper for the SpiderModel in https://github.com/ElementAI/picard/blob/main/seq2seq/utils/picard_model_wrapper.py However, I couldn't find the code which makes use this generate method. I checked the trainer and SpiderTrainer, it seems the evaluate never used the generate method, either the evaluation_loop inside which is from huggingface model. Could you please give a pointer on where the generate wrapper is used in the repo? |
oh, generate is invoked in the Seq2SeqTrainer's prediction_step method. |
Hi!
|
Thanks so much, this information will help me with the root cause analysis for the speed regression! |
Hi,
Thanks for releasing this amazing code repo!
The paper mentioned the batch size is 2k, leading to 410 gradient accumulation steps, which is a bit too slow to fine-tune. I wonder if the authors have tried reduce the batch size, such as 128 as suggested in the T5 paper? Does this degrade the performance a lot?
Thanks!
The text was updated successfully, but these errors were encountered: