Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training settings in the paper differ from the code #15

Open
aapanaetov opened this issue Oct 24, 2022 · 1 comment
Open

Training settings in the paper differ from the code #15

aapanaetov opened this issue Oct 24, 2022 · 1 comment

Comments

@aapanaetov
Copy link

aapanaetov commented Oct 24, 2022

Hi! Thank you for publishing such an amazing work! In the paper you decay learning rate after 6e5 steps while in HQ_Dictionary.yaml it is set to 4e5 steps. Schedule steps, learning rate and loss weights in the configs for both HQ dictionary and RestoreFormer are different from the paper. Which settings should I use to reproduce your excellent results?

@wzhouxiff
Copy link
Owner

Please follow the setting described in the paper. Note that the learning rate set in the config is not the actual learning rate. It will be divided by the number of gpus used. The learning rate described in the paper is the one after dividing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants