Detect vanishing gradients #55

ddobbelaere · 2021-01-31T10:07:38Z

It might be desirable to monitor/detect vanishing gradients during training. Note that I of course mean "stochastic gradient" here, as estimated by the training samples used in the current epoch (maybe the current batch size is too small to excite all king/piece positions, so preferably the mean or max abs over a window of multiple epochs).

This would have detected the anomalies in the input layer (dead weights for some king positions) in vondele's run84run3, see #53.

Note that with GC (gradient centralization), we cannot resort to investigating a mere difference of two checkpoints, as the centralized gradient by definition contains a contribution equal to the mean of all gradient vectors over all neurons of a layer (see equation (1) of https://arxiv.org/pdf/2004.01461v2).

As a "work-around", continued training without GC (use_gc=False in Ranger) on a checkpoint and then comparing/visualizing the difference between a later checkpoint should also do the trick I think.

See also https://discuss.pytorch.org/t/how-to-check-for-vanishing-exploding-gradients/9019

The text was updated successfully, but these errors were encountered:

ddobbelaere closed this as completed May 19, 2021

ddobbelaere reopened this May 19, 2021

Sopel97 added the enhancement New feature or request label Jun 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect vanishing gradients #55

Detect vanishing gradients #55

ddobbelaere commented Jan 31, 2021 •

edited

Loading

Detect vanishing gradients #55

Detect vanishing gradients #55

Comments

ddobbelaere commented Jan 31, 2021 • edited Loading

ddobbelaere commented Jan 31, 2021 •

edited

Loading