Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes problem with Normalizing Flows #21

Merged
merged 1 commit into from
Nov 2, 2015
Merged

Fixes problem with Normalizing Flows #21

merged 1 commit into from
Nov 2, 2015

Conversation

wuaalb
Copy link
Contributor

@wuaalb wuaalb commented Oct 30, 2015

  • Fixes problem where p(z0) was used instead of p(zK), see eq. 15 of
    Rezende NF paper
  • Made NormalizingPlanarFlowLayer layer output logdet-Jacobian instead
    of psi_u so all logic specific to planar type flows is contained in
    layer and other types of flows can be used more easily
  • Various small changes to code, comments and logging for clarity

- Fixes problem where p(z0) was used instead of p(zK), see eq. 15 of
Rezende NF paper
- Made `NormalizingPlanarFlowLayer` layer output logdet-Jacobian instead
of `psi_u` so all logic specific to planar type flows is contained in
layer and other types of flows can be used more easily
- Various small changes to code, comments and logging for clarity
log_pz = log_stdnormal(z).sum(axis=3)
log_px_given_z = log_bernoulli(x, T.clip(x_mu,epsilon,1-epsilon)).sum(axis=3)
log_q0z0_given_x = log_normal2(z0, z0_mu, z0_log_var).sum(axis=3)
log_pzk = log_stdnormal(zk).sum(axis=3)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change log_stdnormal(z).sum(axis=3) to log_stdnormal(zk).sum(axis=3).
Most other changes are to clarify code.

@wuaalb
Copy link
Contributor Author

wuaalb commented Oct 30, 2015

After some training default iw_vae_normflow.py gets

*Epoch=1000     Time=7.87       LR=0.00100      eq_samples=1    iw_samples=1    nflows=5
  TRAIN:        Cost=-91.19209  logqK(zK|x)=-116.52648  = [logq0(z0|x)=-116.33376 - sum logdet J=0.19272]       logp(zK)=-141.95990     logp(x|zK)=-65.75868
  EVAL-L1:      Cost=-90.62387  logqK(zK|x)=-116.30897  = [logq0(z0|x)=-116.11150 - sum logdet J=0.19747]       logp(zK)=-141.43175     logp(x|zK)=-65.50109
  EVAL-L5000:   Cost=-86.20597  logqK(zK|x)=-116.27862  = [logq0(z0|x)=-116.08018 - sum logdet J=0.19843]       logp(zK)=-141.38884     logp(x|zK)=-65.48415

Not sure how this compares to iw_vae.py..

@casperkaae
Copy link
Owner

Thanks. I'll try to review your changes tomorrow.

wrt to the performance I made som preliminary test that showed a small performance gain using normalizing flows. The performance you report seems to be in the right ball park. However I think we need to increase the number of flow transformations (maybe to 40 or 80 as in the paper) before we see a substantial performance gain.

as a baseline performance I get around LL_5000≈-85 after 10000 epochs using a VAE with

  • 500 hidden units,
  • 100 latent units,
  • rectifiers in the encoder,
  • very_leaky_rectifiers in the decoder
  • batch normlization

@casperkaae
Copy link
Owner

Looks good to me @wuaalb

Just two suggestions / questions:

  1. The performance you report was run with the changes you made to the code right ?
  2. Can you change the default hyperparams of the iw_vae_normflow.py to the same values as used in iw_vae.py?

@wuaalb
Copy link
Contributor Author

wuaalb commented Nov 2, 2015

Thanks for reviewing.

  1. Yes those results are after applying these changes, and with the default hyper-parameters (but didn't wait for full 10k epochs). Before making the changes the EVAL-L5000 was ever-decreasing (I ran it once with non-default settings and it was at -65 at epoch 800 when I stopped it).
  2. I think this commit doesn't change the default hyper-parameters and they were already identical?

It would be interesting to see if using many more transformations would help. I'm just worried it will take a long time to train and that maybe the current default hyper-parameters are quite different from the paper's (much higher learning rate, batch normalization, rectifier non-linearities, ..; although what the paper uses exactly for the MNIST results isn't totally clear to me).

Did you by chance happen to do the experiment in section 6.1 (figure 3) of the paper? I think it would be a nice way to ensure normalizing flows are implemented correctly, but I'm not completely sure how it is done..

@casperkaae
Copy link
Owner

@wuaalb

Ah yes, the HP's was already changed to the same values :). I've looked at the experiment in section 6.1, but I couldn't quite figure out how they did it. I'll try to write an email to the author and ask how they did.

I think we should just add a warning in the beginning of the example that it is work in progress / not tested completely - and then just merge the changes. I'll hopefully be able to run some comparison tests on the implementation before to long.

@wuaalb
Copy link
Contributor Author

wuaalb commented Nov 2, 2015

I'll run default iw_vae.py for ~1000 epochs now, just to have an idea (my guess is results will be almost identical).

Anyways, I think merging now is a good idea. At the very least it should be a step in the right direction.

FWIW, I had a half-assed go at the section 6.1 experiment once; I think I generated samples from q0(z) = N(z; mu, sigma^2 I), with mu and sigma^2 set by hand. Then minimized KL divergence between qK(z) and the (unnormalized) true distribution q_true(z) = e^{-U(z)}.

I think the results looked something like this
_nf_test

I'm pretty sure it is more likely that I did something wrong than that there's a problem with NormalizingPlanarFlowLayer, etc.

@casperkaae
Copy link
Owner

I'll wait with the merge until you report back with the performance.

I have opened a new issue (#22) wrt reproducing the results in sec. 6.1 in the "Normalizing flow paper" where we can discuss that further

@wuaalb
Copy link
Contributor Author

wuaalb commented Nov 2, 2015

Results from iw_vae.py (default settings) after 1000 epochs

*Epoch=1000     Time=6.02       LR=0.00100      eq_samples=1    iw_samples=1
  TRAIN:        Cost=-91.28976  logq(z|x)=-116.71328    logp(z)=-141.99548      logp(x|z)=-66.00757
  EVAL-L1:      Cost=-90.70776  logq(z|x)=-116.72040    logp(z)=-141.51874      logp(x|z)=-65.90942
  EVAL-L5000:   Cost=-86.43427  logq(z|x)=-116.69676    logp(z)=-141.50031      logp(x|z)=-65.92197

So as expected, very slightly worse compared to using normalizing flows (with length 5).

@casperkaae
Copy link
Owner

great - at least that indicates that the norm-flow code is working as expected.

I'll merge this now

Thanks

casperkaae added a commit that referenced this pull request Nov 2, 2015
Fixes problem with Normalizing Flows
@casperkaae casperkaae merged commit ea5bbce into casperkaae:master Nov 2, 2015
@wuaalb wuaalb deleted the fix-normflow branch November 3, 2015 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants