loss got nan. #2

kunrenzhilu · 2018-04-13T07:22:20Z

Train Epoch: 3 [0/60000 (0%)] KLD Loss: 2.687659 NLL Loss: 73.599564
Train Epoch: 3 [2800/60000 (21%)] KLD Loss: 2.976363 NLL Loss: 78.757454
Train Epoch: 3 [5600/60000 (43%)] KLD Loss: 2.837864 NLL Loss: 78.958122
Train Epoch: 3 [8400/60000 (64%)] KLD Loss: nan NLL Loss: nan
Train Epoch: 3 [11200/60000 (85%)] KLD Loss: nan NLL Loss: nan
====> Epoch: 3 Average loss: nan
====> Test set loss: KLD Loss = nan, NLL Loss = nan
Train Epoch: 4 [0/60000 (0%)] KLD Loss: nan NLL Loss: nan

YuxuanSong · 2019-05-06T06:15:21Z

This is due to the numerical issue of \log. Add a tiny number, e.g. 1e-5, on the output of enc_std and dec_std will help.

DuaneNielsen · 2020-08-18T05:32:26Z

This may also help..

eps = torch.finfo(torch.float32).eps

def _nll_bernoulli(self, theta, x):
    return - torch.sum(x * torch.log(theta + eps) + (1 - x) * torch.log(1 - theta + eps))

Plus set a minimum stdev as Mr Song mentioned above.

DuaneNielsen · 2020-08-18T19:53:15Z

Also, this model assumes all inputs are between 0.0 - 1.0, so normalize your data using (x - x.min) / (x.max - x.min) before passing in.

yhl48 · 2022-12-14T11:10:16Z

Where is the normalisation assumption made and how does it affect the model if a different normalisation is used?

DuaneNielsen · 2022-12-16T22:27:24Z

Sorry, it's been a couple years since I looked at this model. However generally speaking, NN weights are initialized with Gaussian noise between -1.0 .. 1.0, so it's usually a good idea to feed in numbers in a similar range, otherwise you can get numerical instability as the gradients can be greater than 1.0, causing runaway updates. This is especially problematic when modeling variance... as gradients on variance have more variance themselves that the mean during training. Also, FP representation loses precision as it goes further from -1.0, 1.0. I believe 75% of the re presentable numbers in floating point are between -1.0 - 1.0, and only 25% in the rest of the range. The biggest problem you are going to have with modelling variance is the training process causing variance to become very small, and the likelihood therefore becoming large and positive, and going out of range of your representation. That's why adding a minimum epsilon to variance is a good idea, this ensures that the variance cannot go stupidly small and cause massive (and unrealistic) likelihoods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss got nan. #2

loss got nan. #2

kunrenzhilu commented Apr 13, 2018

YuxuanSong commented May 6, 2019

DuaneNielsen commented Aug 18, 2020

DuaneNielsen commented Aug 18, 2020 •

edited

Loading

yhl48 commented Dec 14, 2022

DuaneNielsen commented Dec 16, 2022 •

edited

Loading

loss got nan. #2

loss got nan. #2

Comments

kunrenzhilu commented Apr 13, 2018

YuxuanSong commented May 6, 2019

DuaneNielsen commented Aug 18, 2020

DuaneNielsen commented Aug 18, 2020 • edited Loading

yhl48 commented Dec 14, 2022

DuaneNielsen commented Dec 16, 2022 • edited Loading

DuaneNielsen commented Aug 18, 2020 •

edited

Loading

DuaneNielsen commented Dec 16, 2022 •

edited

Loading