-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes problem with Normalizing Flows #21
Conversation
- Fixes problem where p(z0) was used instead of p(zK), see eq. 15 of Rezende NF paper - Made `NormalizingPlanarFlowLayer` layer output logdet-Jacobian instead of `psi_u` so all logic specific to planar type flows is contained in layer and other types of flows can be used more easily - Various small changes to code, comments and logging for clarity
log_pz = log_stdnormal(z).sum(axis=3) | ||
log_px_given_z = log_bernoulli(x, T.clip(x_mu,epsilon,1-epsilon)).sum(axis=3) | ||
log_q0z0_given_x = log_normal2(z0, z0_mu, z0_log_var).sum(axis=3) | ||
log_pzk = log_stdnormal(zk).sum(axis=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change log_stdnormal(z).sum(axis=3)
to log_stdnormal(zk).sum(axis=3)
.
Most other changes are to clarify code.
After some training default
Not sure how this compares to |
Thanks. I'll try to review your changes tomorrow. wrt to the performance I made som preliminary test that showed a small performance gain using normalizing flows. The performance you report seems to be in the right ball park. However I think we need to increase the number of flow transformations (maybe to 40 or 80 as in the paper) before we see a substantial performance gain. as a baseline performance I get around LL_5000≈-85 after 10000 epochs using a VAE with
|
Looks good to me @wuaalb Just two suggestions / questions:
|
Thanks for reviewing.
It would be interesting to see if using many more transformations would help. I'm just worried it will take a long time to train and that maybe the current default hyper-parameters are quite different from the paper's (much higher learning rate, batch normalization, rectifier non-linearities, ..; although what the paper uses exactly for the MNIST results isn't totally clear to me). Did you by chance happen to do the experiment in section 6.1 (figure 3) of the paper? I think it would be a nice way to ensure normalizing flows are implemented correctly, but I'm not completely sure how it is done.. |
Ah yes, the HP's was already changed to the same values :). I've looked at the experiment in section 6.1, but I couldn't quite figure out how they did it. I'll try to write an email to the author and ask how they did. I think we should just add a warning in the beginning of the example that it is work in progress / not tested completely - and then just merge the changes. I'll hopefully be able to run some comparison tests on the implementation before to long. |
I'll wait with the merge until you report back with the performance. I have opened a new issue (#22) wrt reproducing the results in sec. 6.1 in the "Normalizing flow paper" where we can discuss that further |
Results from
So as expected, very slightly worse compared to using normalizing flows (with length 5). |
great - at least that indicates that the norm-flow code is working as expected. I'll merge this now Thanks |
Fixes problem with Normalizing Flows
Rezende NF paper
NormalizingPlanarFlowLayer
layer output logdet-Jacobian insteadof
psi_u
so all logic specific to planar type flows is contained inlayer and other types of flows can be used more easily