`ShiftCrossEntropy` passing probabilities to `nn.CrossEntropyLoss` instead of logits #4

manipopopo · 2024-07-14T04:19:06Z

The ShiftCrossEntropy currently utilizes nn.CrossEntropyLoss as its backend, which expects the input to be unnormalized logits. It appears that ShiftCrossEntropy passes input probabilities and target probabilities to the backend instead. This might lead to a deviation from the expected behavior described in equation (7) of the paper.

pesto-full/src/losses/entropy.py

Line 49 in 229f78b

return self.criterion(x1, shift_x2)

The text was updated successfully, but these errors were encountered:

suncerock · 2024-09-30T13:14:12Z

I am having the same issue here.

In my opinion, KL divergence should have the same effect as cross entropy loss, since in the code, the target is detached, and these two losses differ only by the entropy of the target. However, replacing the cross entropy loss with KL divergence make the model fail to converge.

The reason might be numerical issues of pytorch, or as is mentioned, the misuse of nn.CrossEntropyLoss, or other factors...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ShiftCrossEntropy` passing probabilities to `nn.CrossEntropyLoss` instead of logits #4

`ShiftCrossEntropy` passing probabilities to `nn.CrossEntropyLoss` instead of logits #4

manipopopo commented Jul 14, 2024

suncerock commented Sep 30, 2024

ShiftCrossEntropy passing probabilities to nn.CrossEntropyLoss instead of logits #4

ShiftCrossEntropy passing probabilities to nn.CrossEntropyLoss instead of logits #4

Comments

manipopopo commented Jul 14, 2024

suncerock commented Sep 30, 2024

`ShiftCrossEntropy` passing probabilities to `nn.CrossEntropyLoss` instead of logits #4

`ShiftCrossEntropy` passing probabilities to `nn.CrossEntropyLoss` instead of logits #4