Documentation of Loss.get_grad confusingly describes independent variable #901

connorbrinton · 2023-08-15T14:23:05Z

How to reproduce the behaviour

Hi there! I was recently working on implementing a custom loss function (binary focal loss) and found some of the documentation to be a bit confusing. The documentation of Loss.get_grad states that it should:

Calculate the gradient of the loss with respect with the model outputs.

However, looking at the implementation of some of Thinc's built-in loss functions, Loss.get_grad actually calculates the gradient of the loss with respect to the logits used as input to the preceding softmax/sigmoid layer.

For example, the CategoricalCrossentropy loss class computes the gradient as guesses - target. This is off from the derivative of the loss wrt the model outputs (probabilities) by a factor of 1 / (p * (1 - p)). This factor is cancelled out by the derivative of the logistic function in the derivative of the loss wrt to the logits, giving the derivative wrt to the logits as guesses - targets.

This whole setup works because the softmax part of the softmax layer uses the identity function as its backwards pass. This ends up making the forwards and backwards passes of the softmax layer inconsistent, but in theory everything balances out.

I assume that this setup was selected to help improve numerical stability. The focal loss paper actually mentions this explicitly:

we note that the implementation of the loss layer combines the sigmoid operation for computing p with the loss computation, resulting in greater numerical stability.

Anyways, the point of this issue is that the current documentation of Loss.get_grad is confusing, since the gradient is actually computed with respect to the logits, and not the model outputs, even though the model outputs are what is provided to the method. It would be great to have this clarified in the documentation 🙂

Thanks for maintaining Thinc! 😄

Your Environment

Operating System: macOS 13.5
Python Version Used: 3.9.16
Thinc Version Used: 8.1.10
Environment Information: Poetry virtual environment, M1 mac

The text was updated successfully, but these errors were encountered:

rmitsch · 2023-08-17T10:00:59Z

Hi @connorbrinton, thanks for reporting this! We'll look into it and update this thread.

rmitsch added docs Documentation feat / loss Loss functions labels Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation of Loss.get_grad confusingly describes independent variable #901

Documentation of Loss.get_grad confusingly describes independent variable #901

connorbrinton commented Aug 15, 2023

rmitsch commented Aug 17, 2023

Documentation of Loss.get_grad confusingly describes independent variable #901

Documentation of Loss.get_grad confusingly describes independent variable #901

Comments

connorbrinton commented Aug 15, 2023

How to reproduce the behaviour

Your Environment

rmitsch commented Aug 17, 2023