Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShiftCrossEntropy passing probabilities to nn.CrossEntropyLoss instead of logits #4

Open
manipopopo opened this issue Jul 14, 2024 · 1 comment

Comments

@manipopopo
Copy link

The ShiftCrossEntropy currently utilizes nn.CrossEntropyLoss as its backend, which expects the input to be unnormalized logits. It appears that ShiftCrossEntropy passes input probabilities and target probabilities to the backend instead. This might lead to a deviation from the expected behavior described in equation (7) of the paper.

return self.criterion(x1, shift_x2)

@suncerock
Copy link

I am having the same issue here.

In my opinion, KL divergence should have the same effect as cross entropy loss, since in the code, the target is detached, and these two losses differ only by the entropy of the target. However, replacing the cross entropy loss with KL divergence make the model fail to converge.

The reason might be numerical issues of pytorch, or as is mentioned, the misuse of nn.CrossEntropyLoss, or other factors...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants