Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem about loss_ce #14

Open
MC-E opened this issue Sep 10, 2020 · 9 comments
Open

Problem about loss_ce #14

MC-E opened this issue Sep 10, 2020 · 9 comments

Comments

@MC-E
Copy link

MC-E commented Sep 10, 2020

Thanks for your novel work! But I'm a little confused about loss_ce: l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0]. I want to know why this loss function can restrain z following the Gaussian distribution. Looking forward to your reply!

@pkuxmq
Copy link
Owner

pkuxmq commented Sep 10, 2020

Hi, the strict distribution matching is realized by the JS divergence on X. As said in the paper, because of the invertibility, distributions match on X if and only if (y, z) follows the joint distribution of (f^y(q(x)), p(z)), which means z follows the Gaussian distribution and z is independent from y. In practice, we introduce a pre-training stage for stable training. In this stage, we use cross-entropy as a weaker surrogate objective which pushes the density of z towards p(z), but distributions may not strictly match in principle.

@MC-E
Copy link
Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

4 similar comments
@MC-E
Copy link
Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

@MC-E
Copy link
Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

@MC-E
Copy link
Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

@MC-E
Copy link
Author

MC-E commented Sep 10, 2020

Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions?

@pkuxmq
Copy link
Owner

pkuxmq commented Sep 10, 2020

Yes, and we have ablation experiments on it in the paper.

@codyshen0000
Copy link

codyshen0000 commented Oct 19, 2020

@pkuxmq
So can we use Gaussian distributions' log_prob to calculate the l_forw_ce? as I think the l_forw_ce loss works for constraining Gaussian distribution

@pkuxmq pkuxmq mentioned this issue Oct 22, 2020
@howardyclo
Copy link

howardyclo commented Jan 16, 2021

I think it might be also related to this answer? Why is regularization interpreted as a gaussian prior on my weights?
Minimizing L2 loss of z has the probabilistic interpretation of assuming z is drawn from a normal distribution (mean=0, std=1), thus minimizing L2_norm(z) = maximizing likelihood of normal distribution N(z; mean=0, std=1). Is that correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants