You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read on another issue #6 that the main training runs for 260 epochs with 3771 samples per epoch. That should be 260*3771/4(batch size) ~ 240K iterations while pretraining runs for 2M iterations. Why would it take just 4 days for pretraining but 3 days for main training as mentioned in the paper, given that each iteration should approximately take the same amount of time?
Am I missing something? I am trying to re-train the network but 260 epochs seem insufficient. Thanks a lot!
The text was updated successfully, but these errors were encountered:
Hi @hkchengrex, thanks for pointing out my mistake in the previous answer.
In fact, pre-train runs for 2M samples not iterations. So it is about 500K iterations as a batch. In paper, we roughly report the training time without an accurate measuring of time. If it cause you misunderstanding, I am sorry about that. You are right that pretrain take twice more time than fine-tuning. In our implementation, 260 epochs for FT is sufficient as we regularly reduce LR.
Hi, thanks for your code and work.
I read on another issue #6 that the main training runs for 260 epochs with 3771 samples per epoch. That should be 260*3771/4(batch size) ~ 240K iterations while pretraining runs for 2M iterations. Why would it take just 4 days for pretraining but 3 days for main training as mentioned in the paper, given that each iteration should approximately take the same amount of time?
Am I missing something? I am trying to re-train the network but 260 epochs seem insufficient. Thanks a lot!
The text was updated successfully, but these errors were encountered: