How to get such a high benchmark on glue-task with Gluon-bert? #1544

Jetcodery · 2021-03-23T02:51:39Z

Jetcodery
Mar 23, 2021

hello everyone,

I did some work to reproduce training from scratch with bert, my codes mainly based on transformers, and I get the benchmark below; What i want to ask is, for training faster, i choose max_seq_len=128 on all 1 million steps; Is that the main cause which leads to the decrease on benchmark? Did Gluon train with max_seq_len=512 for all steps? Looking forward to your reply, thx!

@szha

Answered by szha

Mar 24, 2021

Hi @Jetcodery. Yes, I think the decreased sequence length would decrease performance. In the original experiment for reproducing BERT we did the training with sequence length 512. Nowadays, many people train BERT in two stages, with the first stage under length 128, followed by a second stage training of 512. The two stage training appears to close the performance gap.

View full answer

szha · 2021-03-24T18:34:09Z

szha
Mar 24, 2021
Maintainer

Hi @Jetcodery. Yes, I think the decreased sequence length would decrease performance. In the original experiment for reproducing BERT we did the training with sequence length 512. Nowadays, many people train BERT in two stages, with the first stage under length 128, followed by a second stage training of 512. The two stage training appears to close the performance gap.

1 reply

Jetcodery Mar 25, 2021
Author

Thank you for your reply! i will restore my model for further training with 512-length and test on glue-tasks again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get such a high benchmark on glue-task with Gluon-bert? #1544

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to get such a high benchmark on glue-task with Gluon-bert? #1544

Jetcodery Mar 23, 2021

Replies: 1 comment · 1 reply

szha Mar 24, 2021 Maintainer

Jetcodery Mar 25, 2021 Author

Jetcodery
Mar 23, 2021

Replies: 1 comment 1 reply

szha
Mar 24, 2021
Maintainer

Jetcodery Mar 25, 2021
Author