Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH deal with tiny loss improvements in line search #724

Merged
merged 2 commits into from
Nov 8, 2023

Conversation

lorentzenchr
Copy link
Contributor

@lorentzenchr lorentzenchr commented Nov 2, 2023

This PR adds a 2. convergence check based on gradients in the line search. This can deal with tiny loss improvements.

This is taken from https://github.com/scikit-learn/scikit-learn/blob/9621539a9defe86ff55c890d5f2475f42697604f/sklearn/linear_model/_glm/_newton_solver.py#L263.

This might help with some of the tests in #723.

Checklist

  • Added a CHANGELOG.rst entry

Copy link
Member

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change. Does not have any negative performance impact on our current benchmarks, and the logic seems to make sense. The new conditional branch is rarely used in our current golden master tests, but used quite often in the test_glm.py tests that @lorentzenchr provided.

Two small requests before merging:

  • Is it possible to provide a source for the logic behind the tiny loss branch?
  • Can you add an entry to the changelog in the "Unreleased" section.

@lorentzenchr
Copy link
Contributor Author

Is it possible to provide a source for the logic behind the tiny loss branch?

I developed this in scikit-learn/scikit-learn#24637, trying to satisfy my own (very strict) tests. I can't provide any literature on this.
The basic insight is as follows:

  • In general, the first order condition alias score equation is a better condition for checking convergence than loss improvements or step sizes (change in coefficients) because we know the absolute scale: it should be zero.
  • Empirically, if we are already close to the minimum and seeking for a high precision solution, the floating point precision of the loss might not be enough while the gradient has not yet achieved, say 1e-12 or lower. Therefore, the gradient has more information in such a setting.
  • The line search may then reject a (Newton) step that brings the gradient closer to zero, but that provides no improvement of the loss (but no worsening either up to machine precision).

@MarcAntoineSchmidtQC
Copy link
Member

Thanks @lorentzenchr!

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC merged commit a304d55 into Quantco:main Nov 8, 2023
13 checks passed
@lorentzenchr lorentzenchr deleted the linear_search branch November 10, 2023 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants