Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

How to get such a high benchmark on glue-task with Gluon-bert? #1544

Answered by szha
Jetcodery asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @Jetcodery. Yes, I think the decreased sequence length would decrease performance. In the original experiment for reproducing BERT we did the training with sequence length 512. Nowadays, many people train BERT in two stages, with the first stage under length 128, followed by a second stage training of 512. The two stage training appears to close the performance gap.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@Jetcodery
Comment options

Answer selected by Jetcodery
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants