Author: Jiang Chen ([email protected])
GBDT is a high performance and full featured C++ implementation of Jerome H. Friedman's Gradient Boosting Decision Trees Algorithm and its modern offsprings,. It features high efficiency, low memory footprint, collections of loss functions and built-in mechanisms to handle categorical features and missing values.
- You are looking beyond linear models.
- Gradient Boosting Decision Trees Algorithms is one of the best offshelf ML algorithms with built-in capabilities of non-linear transformation and feature crossing.
- Your data is too big to load into memory with existing ML packages.
- GBDT reduces memory footprint dramatically with feature bucketization. For some tested datasets, it used 1/7 of the memory of its counterpart and took only 1/2 time to train. See docs/PERFORMANCE_BENCHMARK.md for more details.
- You want better handling of categorical features and missing values.
- GBDT has built-in mechanisms to figure out how to split categorical features and place missing values in the trees.
- You want to try different loss functions.
- GBDT implements various pointwise, pairwise, listingwis loss functions including mse, logloss, huberized hinge loss, pairwise logloss, GBRank and LambdaMart. It supports easily addition of your own custom loss functions.
- Install the latest stable version:
pip install gbdt
- Install the latest development version:
pip install git+https://github.com/yarny/gbdt.git