What if reducing the batch size? #82

ywen666 · 2022-03-31T21:56:28Z

Hi,

Thanks for releasing this amazing code repo!
The paper mentioned the batch size is 2k, leading to 410 gradient accumulation steps, which is a bit too slow to fine-tune. I wonder if the authors have tried reduce the batch size, such as 128 as suggested in the T5 paper? Does this degrade the performance a lot?

Thanks!

tscholak · 2022-04-01T00:43:59Z

Hi @ywen666,
Thanks! You can try with batch size around 32, and that should work as well.

ywen666 · 2022-04-01T15:51:43Z

Hi thanks for the suggestion! I also share a question with the other issue where the evaluation is slow.

Running python seq2seq/run_seq2seq.py configs/eval.json takes 9 hours to complete on a 4-gpu (rtx 5000) node, 200s per iteration, which seems too slow.

If I disable picard, runnning python seq2seq/run_seq2seq.py configs/nopicard_eval.json, it is pretty fast, finished in 8mins, 1.6s per iteration.

I wonder what is the best way to check which part bottlenecks the inference speed?

tscholak · 2022-04-01T16:51:04Z

There were some changes recently to the parser that may have resulted in a performance regression. I suspect that this is the cause the slowdown. When I have the time, I’ll look into this.

tscholak · 2022-04-01T17:08:27Z

You could help me out by telling me which input-output pairs take the longest to generate.

ywen666 · 2022-04-02T16:53:17Z

I am looking into this but it will take some time for me to figure it out.

ywen666 · 2022-04-05T00:52:08Z

Hi, I am trying to time each example generation time. I found the generate method wrapper for the SpiderModel in https://github.com/ElementAI/picard/blob/main/seq2seq/utils/picard_model_wrapper.py

However, I couldn't find the code which makes use this generate method. I checked the trainer and SpiderTrainer, it seems the evaluate never used the generate method, either the evaluation_loop inside which is from huggingface model.

Could you please give a pointer on where the generate wrapper is used in the repo?

ywen666 · 2022-04-07T22:27:18Z

oh, generate is invoked in the Seq2SeqTrainer's prediction_step method.

takacsg84 · 2022-04-27T13:15:19Z

Hi!
Evaluated on the spider_realistic dataset, and as @tscholak asked logged the calculation time for each question. Here is the full list: https://docs.google.com/spreadsheets/d/1NGui5DPQU5SChHzXXzfYYNjP6HknbM-VcXXI77dbGNk/edit?usp=sharing
And here is the question that was the slowest:

500: What is the msot common country for singer? (0 days 00:21:56.996459)

tscholak · 2022-04-27T14:44:32Z

Thanks so much, this information will help me with the root cause analysis for the speed regression!

tscholak added the question Further information is requested label Apr 1, 2022

tscholak mentioned this issue Apr 11, 2022

The best parameters for PICARD in eval. #85

Closed

tscholak added the bug Something isn't working label Jul 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What if reducing the batch size? #82

What if reducing the batch size? #82

ywen666 commented Mar 31, 2022

tscholak commented Apr 1, 2022

ywen666 commented Apr 1, 2022

tscholak commented Apr 1, 2022

tscholak commented Apr 1, 2022

ywen666 commented Apr 2, 2022

ywen666 commented Apr 5, 2022 •

edited

Loading

ywen666 commented Apr 7, 2022

takacsg84 commented Apr 27, 2022

tscholak commented Apr 27, 2022

What if reducing the batch size? #82

What if reducing the batch size? #82

Comments

ywen666 commented Mar 31, 2022

tscholak commented Apr 1, 2022

ywen666 commented Apr 1, 2022

tscholak commented Apr 1, 2022

tscholak commented Apr 1, 2022

ywen666 commented Apr 2, 2022

ywen666 commented Apr 5, 2022 • edited Loading

ywen666 commented Apr 7, 2022

takacsg84 commented Apr 27, 2022

tscholak commented Apr 27, 2022

ywen666 commented Apr 5, 2022 •

edited

Loading