New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Problem about loss_ce #14

Open

MC-E opened this issue Sep 10, 2020 · 9 comments

MC-E commented Sep 10, 2020

Thanks for your novel work! But I'm a little confused about loss_ce: l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0]. I want to know why this loss function can restrain z following the Gaussian distribution. Looking forward to your reply!

Owner

pkuxmq commented Sep 10, 2020

Hi, the strict distribution matching is realized by the JS divergence on X. As said in the paper, because of the invertibility, distributions match on X if and only if (y, z) follows the joint distribution of (f^y(q(x)), p(z)), which means z follows the Gaussian distribution and z is independent from y. In practice, we introduce a pre-training stage for stable training. In this stage, we use cross-entropy as a weaker surrogate objective which pushes the density of z towards p(z), but distributions may not strictly match in principle.

Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

4 similar comments

Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

Owner

pkuxmq commented Sep 10, 2020

Yes, and we have ablation experiments on it in the paper.

codyshen0000 commented Oct 19, 2020 •

edited

Loading

@pkuxmq
So can we use Gaussian distributions' log_prob to calculate the l_forw_ce? as I think the l_forw_ce loss works for constraining Gaussian distribution

pkuxmq mentioned this issue

About loss_ce? #22

Closed

howardyclo commented Jan 16, 2021 •

edited

Loading

I think it might be also related to this answer? Why is regularization interpreted as a gaussian prior on my weights?
Minimizing L2 loss of z has the probabilistic interpretation of assuming z is drawn from a normal distribution (mean=0, std=1), thus minimizing L2_norm(z) = maximizing likelihood of normal distribution N(z; mean=0, std=1). Is that correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment