Fixes problem with Normalizing Flows #21

wuaalb · 2015-10-30T15:44:48Z

Fixes problem where p(z0) was used instead of p(zK), see eq. 15 of
Rezende NF paper
Made NormalizingPlanarFlowLayer layer output logdet-Jacobian instead
of psi_u so all logic specific to planar type flows is contained in
layer and other types of flows can be used more easily
Various small changes to code, comments and logging for clarity

- Fixes problem where p(z0) was used instead of p(zK), see eq. 15 of Rezende NF paper - Made `NormalizingPlanarFlowLayer` layer output logdet-Jacobian instead of `psi_u` so all logic specific to planar type flows is contained in layer and other types of flows can be used more easily - Various small changes to code, comments and logging for clarity

wuaalb · 2015-10-30T15:46:23Z

examples/iw_vae_normflow.py

-    log_pz = log_stdnormal(z).sum(axis=3)
-    log_px_given_z = log_bernoulli(x, T.clip(x_mu,epsilon,1-epsilon)).sum(axis=3)
+    log_q0z0_given_x = log_normal2(z0, z0_mu, z0_log_var).sum(axis=3)
+    log_pzk = log_stdnormal(zk).sum(axis=3)


This is the main change log_stdnormal(z).sum(axis=3) to log_stdnormal(zk).sum(axis=3).
Most other changes are to clarify code.

wuaalb · 2015-10-30T19:07:11Z

After some training default iw_vae_normflow.py gets

*Epoch=1000     Time=7.87       LR=0.00100      eq_samples=1    iw_samples=1    nflows=5
  TRAIN:        Cost=-91.19209  logqK(zK|x)=-116.52648  = [logq0(z0|x)=-116.33376 - sum logdet J=0.19272]       logp(zK)=-141.95990     logp(x|zK)=-65.75868
  EVAL-L1:      Cost=-90.62387  logqK(zK|x)=-116.30897  = [logq0(z0|x)=-116.11150 - sum logdet J=0.19747]       logp(zK)=-141.43175     logp(x|zK)=-65.50109
  EVAL-L5000:   Cost=-86.20597  logqK(zK|x)=-116.27862  = [logq0(z0|x)=-116.08018 - sum logdet J=0.19843]       logp(zK)=-141.38884     logp(x|zK)=-65.48415

Not sure how this compares to iw_vae.py..

casperkaae · 2015-10-31T21:00:22Z

Thanks. I'll try to review your changes tomorrow.

wrt to the performance I made som preliminary test that showed a small performance gain using normalizing flows. The performance you report seems to be in the right ball park. However I think we need to increase the number of flow transformations (maybe to 40 or 80 as in the paper) before we see a substantial performance gain.

as a baseline performance I get around LL_5000≈-85 after 10000 epochs using a VAE with

500 hidden units,
100 latent units,
rectifiers in the encoder,
very_leaky_rectifiers in the decoder
batch normlization

casperkaae · 2015-11-02T09:25:27Z

Looks good to me @wuaalb

Just two suggestions / questions:

The performance you report was run with the changes you made to the code right ?
Can you change the default hyperparams of the iw_vae_normflow.py to the same values as used in iw_vae.py?

wuaalb · 2015-11-02T10:29:50Z

Thanks for reviewing.

Yes those results are after applying these changes, and with the default hyper-parameters (but didn't wait for full 10k epochs). Before making the changes the EVAL-L5000 was ever-decreasing (I ran it once with non-default settings and it was at -65 at epoch 800 when I stopped it).
I think this commit doesn't change the default hyper-parameters and they were already identical?

It would be interesting to see if using many more transformations would help. I'm just worried it will take a long time to train and that maybe the current default hyper-parameters are quite different from the paper's (much higher learning rate, batch normalization, rectifier non-linearities, ..; although what the paper uses exactly for the MNIST results isn't totally clear to me).

Did you by chance happen to do the experiment in section 6.1 (figure 3) of the paper? I think it would be a nice way to ensure normalizing flows are implemented correctly, but I'm not completely sure how it is done..

casperkaae · 2015-11-02T10:42:38Z

@wuaalb

Ah yes, the HP's was already changed to the same values :). I've looked at the experiment in section 6.1, but I couldn't quite figure out how they did it. I'll try to write an email to the author and ask how they did.

I think we should just add a warning in the beginning of the example that it is work in progress / not tested completely - and then just merge the changes. I'll hopefully be able to run some comparison tests on the implementation before to long.

wuaalb · 2015-11-02T11:17:01Z

I'll run default iw_vae.py for ~1000 epochs now, just to have an idea (my guess is results will be almost identical).

Anyways, I think merging now is a good idea. At the very least it should be a step in the right direction.

FWIW, I had a half-assed go at the section 6.1 experiment once; I think I generated samples from q0(z) = N(z; mu, sigma^2 I), with mu and sigma^2 set by hand. Then minimized KL divergence between qK(z) and the (unnormalized) true distribution q_true(z) = e^{-U(z)}.

I think the results looked something like this

I'm pretty sure it is more likely that I did something wrong than that there's a problem with NormalizingPlanarFlowLayer, etc.

casperkaae · 2015-11-02T11:28:01Z

I'll wait with the merge until you report back with the performance.

I have opened a new issue (#22) wrt reproducing the results in sec. 6.1 in the "Normalizing flow paper" where we can discuss that further

wuaalb · 2015-11-02T13:31:21Z

Results from iw_vae.py (default settings) after 1000 epochs

*Epoch=1000     Time=6.02       LR=0.00100      eq_samples=1    iw_samples=1
  TRAIN:        Cost=-91.28976  logq(z|x)=-116.71328    logp(z)=-141.99548      logp(x|z)=-66.00757
  EVAL-L1:      Cost=-90.70776  logq(z|x)=-116.72040    logp(z)=-141.51874      logp(x|z)=-65.90942
  EVAL-L5000:   Cost=-86.43427  logq(z|x)=-116.69676    logp(z)=-141.50031      logp(x|z)=-65.92197

So as expected, very slightly worse compared to using normalizing flows (with length 5).

casperkaae · 2015-11-02T14:03:54Z

great - at least that indicates that the norm-flow code is working as expected.

I'll merge this now

Thanks

Fixes problem with Normalizing Flows

wuaalb reviewed Oct 30, 2015
View reviewed changes

casperkaae mentioned this pull request Nov 2, 2015

Reproduce results from sec. 6.1 in "Variational inference using normalizing flows" #22

Open

casperkaae added a commit that referenced this pull request Nov 2, 2015

Merge pull request #21 from wuaalb/fix-normflow

ea5bbce

Fixes problem with Normalizing Flows

casperkaae merged commit ea5bbce into casperkaae:master Nov 2, 2015

wuaalb deleted the fix-normflow branch November 3, 2015 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes problem with Normalizing Flows #21

Fixes problem with Normalizing Flows #21

wuaalb commented Oct 30, 2015

wuaalb Oct 30, 2015

wuaalb commented Oct 30, 2015

casperkaae commented Oct 31, 2015

casperkaae commented Nov 2, 2015

wuaalb commented Nov 2, 2015

casperkaae commented Nov 2, 2015

wuaalb commented Nov 2, 2015

casperkaae commented Nov 2, 2015

wuaalb commented Nov 2, 2015

casperkaae commented Nov 2, 2015

Fixes problem with Normalizing Flows #21

Fixes problem with Normalizing Flows #21

Conversation

wuaalb commented Oct 30, 2015

wuaalb Oct 30, 2015

Choose a reason for hiding this comment

wuaalb commented Oct 30, 2015

casperkaae commented Oct 31, 2015

casperkaae commented Nov 2, 2015

wuaalb commented Nov 2, 2015

casperkaae commented Nov 2, 2015

wuaalb commented Nov 2, 2015

casperkaae commented Nov 2, 2015

wuaalb commented Nov 2, 2015

casperkaae commented Nov 2, 2015