Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weight norm #91

Closed
albertz opened this issue Jan 5, 2022 · 3 comments
Closed

Weight norm #91

albertz opened this issue Jan 5, 2022 · 3 comments

Comments

@albertz
Copy link
Member

albertz commented Jan 5, 2022

Implement weight norm.
The implementation can look very similar to PyTorch.

Probably similar would be weight dropout (#100) or other transformations or reparameterizations of weights.

The more generic issue is #59.

@albertz
Copy link
Member Author

albertz commented Jan 5, 2022

Note that weight norm only makes sense on model parameters but not on auxiliary parameters (used for e.g. collecting running statistics or so). For that reason, the nn.Parameter auxiliary flag was introduced.

@albertz albertz mentioned this issue Mar 20, 2022
@albertz albertz changed the title How to implement weight norm Weight norm Apr 8, 2022
@albertz
Copy link
Member Author

albertz commented Nov 11, 2022

While implementing weight norm, I stumbled upon the problem of weight decay on its parameterization (#241). Specifically, we would not want weight decay on g (or do we?).

In Lingvo, they solved this by parameterizing it as 1 + g instead (here). Do we also want this? Or do it just like layer norm?

@albertz
Copy link
Member Author

albertz commented Nov 12, 2022

We now have an initial implementation, see nn.weight_norm.

I ignore some of the raised aspects here, namely:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant