-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem about loss_ce #14
Comments
Hi, the strict distribution matching is realized by the JS divergence on X. As said in the paper, because of the invertibility, distributions match on X if and only if (y, z) follows the joint distribution of (f^y(q(x)), p(z)), which means z follows the Gaussian distribution and z is independent from y. In practice, we introduce a pre-training stage for stable training. In this stage, we use cross-entropy as a weaker surrogate objective which pushes the density of z towards p(z), but distributions may not strictly match in principle. |
Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions? |
4 similar comments
Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions? |
Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions? |
Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions? |
Thanks for your reply! Does it mean the l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0] is only an assistant component rather than a strict constraint of gaussian distributions? |
Yes, and we have ablation experiments on it in the paper. |
@pkuxmq |
I think it might be also related to this answer? Why is regularization interpreted as a gaussian prior on my weights? |
Thanks for your novel work! But I'm a little confused about loss_ce: l_forw_ce = self.train_opt['lambda_ce_forw'] * torch.sum(z**2) / z.shape[0]. I want to know why this loss function can restrain z following the Gaussian distribution. Looking forward to your reply!
The text was updated successfully, but these errors were encountered: