Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring proposal for distribution class / alternative options for using Hessian #51

Open
stefan-schroedl opened this issue Jul 8, 2015 · 0 comments

Comments

@stefan-schroedl
Copy link
Contributor

The function API in the current distribution class has instance vectors for the target (adY), the current ensemble value (adF), the tree value (adFadj), the gradient (adZ), the weights (adW), and the offset (adOffset). First, ComputeWorkingResponse is called to calculate the gradient. Then, it is passed to FitBestConstant.

Although FitBestConstant is implemented in every separate distribution, it is quite similar each time: A numerator array keeps track of the sum of gradients per terminal node, and a denominator array sums the diagonals of the Hessian (computed here). The final predicted value is the ratio of the two.

Proposal: If we changed the interfaces of ComputeWorkingResponse and FitBestConstant to include the Hessian as well, it might be possible to reuse the same single implementation of FitBestConstant. Moreover, this would make it easier to allow options to use the Hessian differently, or not at all.

While the Newton algorithm helps find a good solution fast, sometimes the final model might be actually better using gradients alone (or, as a compromise, limit/cap the gradients). Low Hessians can easily lead to overfitting. I realize such a cap is implemented for the Bernoulli distribution, but we could make the procedure generally applicable for all distributions - or give the user an option to use only gradients (for the initial trees) ... Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant