Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control over batch-norm running_mean/var buffers #767

Closed
adefazio opened this issue May 30, 2024 · 3 comments
Closed

Control over batch-norm running_mean/var buffers #767

adefazio opened this issue May 30, 2024 · 3 comments

Comments

@adefazio
Copy link
Contributor

Control over batch-norm running_mean/var buffers

Following up on the request in the recent working group meeting regarding future improvements to the challenge, it would be extremely useful if we had control over the running_mean/var buffers of batch-norm layers. Currently, if different iterates are used for evaluation and training (i.e. EMA or Schedule-Free averaging is used) then the running_mean/var values will be incorrect as they average over the training iterates.

This, together with the eval() support requested in #758 would make it much easier to implement averaging approaches.

In terms of control, it would be useful to turn on/off the updating of the running mean/var during forward passes, and to directly access their values. Currently there is a update_batch_norm switch that calls update_batch_norm_fn in pytorch_utils, but it doesn't allow us to update the batchnorm stats when in eval mode (eval mode changes the behavior of dropout, so we want to be in eval mode when updating BN statistics right before a model evaluation).

Also, having the batch-norm running-mean/var directly provided in the API would give a model-agnostic way to access them, currently we would need to loop over all modules and check if they are Pytorch Batch norm or the custom ConformerBatchNorm & DeepspeechBatchNorm layers.

A third point is rules clarity around batchnorm layers. Are we freely allowed to change the batch-norm momentum during training (which allows us to freeze the running stats, reset them, and otherwise change the speed they are updated), as well as the running-mean/var buffers?

@priyakasimbeg
Copy link
Contributor

We're planning on discussing feature requests like these in the benchmark code on Thursday 9/5 during the WG meeting.

@adefazio
Copy link
Contributor Author

adefazio commented Sep 5, 2024

I've created a PR #783

@priyakasimbeg
Copy link
Contributor

Changes merged in #798

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants