Refactoring proposal for distribution class / alternative options for using Hessian #51

stefan-schroedl · 2015-07-08T05:58:43Z

The function API in the current distribution class has instance vectors for the target (adY), the current ensemble value (adF), the tree value (adFadj), the gradient (adZ), the weights (adW), and the offset (adOffset). First, ComputeWorkingResponse is called to calculate the gradient. Then, it is passed to FitBestConstant.

Although FitBestConstant is implemented in every separate distribution, it is quite similar each time: A numerator array keeps track of the sum of gradients per terminal node, and a denominator array sums the diagonals of the Hessian (computed here). The final predicted value is the ratio of the two.

Proposal: If we changed the interfaces of ComputeWorkingResponse and FitBestConstant to include the Hessian as well, it might be possible to reuse the same single implementation of FitBestConstant. Moreover, this would make it easier to allow options to use the Hessian differently, or not at all.

While the Newton algorithm helps find a good solution fast, sometimes the final model might be actually better using gradients alone (or, as a compromise, limit/cap the gradients). Low Hessians can easily lead to overfitting. I realize such a cap is implemented for the Bernoulli distribution, but we could make the procedure generally applicable for all distributions - or give the user an option to use only gradients (for the initial trees) ... Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring proposal for distribution class / alternative options for using Hessian #51

Refactoring proposal for distribution class / alternative options for using Hessian #51

stefan-schroedl commented Jul 8, 2015

Refactoring proposal for distribution class / alternative options for using Hessian #51

Refactoring proposal for distribution class / alternative options for using Hessian #51

Comments

stefan-schroedl commented Jul 8, 2015