Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New model request: gpboost Tree-Boosting with Gaussian Process and Mixed Effects Models #47

Open
schelhorn opened this issue Oct 20, 2022 · 3 comments
Labels
feature a feature request or enhancement

Comments

@schelhorn
Copy link

schelhorn commented Oct 20, 2022

The gpboost package on CRAN by @fabsig explains itself as such:

Combining Tree-Boosting with Gaussian Process and Mixed Effects Models
An R package that allows for combining tree-boosting with Gaussian process and mixed effects models. It also allows for independently doing tree-boosting as well as inference and prediction for Gaussian process and mixed effects models. See https://github.com/fabsig/GPBoost for more information on the software and Sigrist (2020) <arXiv:2004.02653> and Sigrist (2021) <arXiv:2105.08966> for more information on the methodology.

I would suggest that it would make a nice extension to {multilevelmod} due to its ability to model non-linear relationships and work well with high-cardinality categorical data.

From the paper abstract of the approach:

We introduce a novel way to combine boosting with Gaussian process and mixed effects models. This allows for relaxing, first, the zero or linearity assumption for the prior mean function in Gaussian process and grouped random effects models in a flexible non-parametric way and, second, the independence assumption made in most boosting algorithms. The former is advantageous for prediction accuracy and for avoiding model misspecifications. The latter is important for efficient learning of the fixed effects predictor function and for obtaining probabilistic predictions. Our proposed algorithm is also a novel solution for handling high-cardinality categorical variables in tree-boosting. In addition, we present an extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference. We obtain increased prediction accuracy compared to existing approaches on multiple simulated and real-world data sets.

And the main text of the paper:

In summary, both the linearity assumption in Gaussian process models and the independence assumption in boosting are often questionable. The goal of this article is to relax these restrictive assumptions by combining boosting with Gaussian process and mixed effects models. Specifically, we propose to model the mean function using an ensemble of base learners, such as regression trees (Breiman et al., 1984), learned in a stage-wise manner using boosting, and the second-order structure is modeled using a Gaussian process or mixed effects model. In doing so, the parameters of the covariance function are estimated jointly with the mean function; see Section 2 for more details.

The paper is very well written and the package is actively developed on Github, with the last commit from two months ago. Multiple usage examples are linked here, the most comprehensive being this one. Model hyperparameters are explained here.

From the documention, I believe it can work with the following responses:
regression, regression_l1, huber, binary, lambdarank, multiclass

@fabsig
Copy link

fabsig commented Oct 20, 2022

@schelhorn: many thanks for this suggestion!

Just a small clarification: currently, GPBoost supports the following response distributions: gaussian, bernoulli_probit (= binary), bernoulli_logit, poisson, gamma; see here for a list of currently supported likelihoods.

@hfrick hfrick added the feature a feature request or enhancement label Dec 19, 2022
@hfrick
Copy link
Member

hfrick commented Nov 1, 2023

Thank you for the detailed issue with the references 🙌 It's sitting here until the next round of triaging/implementing new models but it hasn't fallen off the radar.

@tdemarchin
Copy link

Hi, Upvoting this as I would be very interested to have GPboost included in the tidymodels panel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants