HyperGBM

What is HyperGBM

HyperGBM is a library that supports full-pipeline AutoML, which completely covers the end-to-end stages of data cleaning, preprocessing, feature generation and selection, model selection and hyperparameter optimization.It is a real-AutoML tool for tabular data.

Overview

Unlike most AutoML approaches that focus on tackling the hyperparameter optimization problem of machine learning algorithms, HyperGBM can put the entire process from data cleaning to algorithm selection in one search space for optimization. End-to-end pipeline optimization is more like a sequential decision process, thereby HyperGBM uses reinforcement learning, Monte Carlo Tree Search, evolution algorithm combined with a meta-learner to efficiently solve such problems.

As the name implies, the ML algorithms used in HyperGBM are all GBM models, and more precisely the gradient boosting tree model, which currently includes XGBoost, LightGBM and Catboost.

The underlying search space representation and search algorithm in HyperGBM are powered by the Hypernets project a general AutoML framework.

Tutorial

Installation

pip install hypergbm

Examples

User can create experiment instance with make_experiment and run it quickly。train_data is the only required parameter, all others are optional. The target is also required if your target feature name isn't y。

Codes:

from hypergbm import make_experiment
from tabular_toolbox.datasets import dsutils

train_data = dsutils.load_blood()
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)

Outputs：

Pipeline(steps=[('data_clean',
                 DataCleanStep(...),
                ('estimator',
                 GreedyEnsemble(...)])

Process finished with exit code 0

Hypergbm also provides command line tools to train models and predict data:

hypergm -h

usage: hypergbm [-h] --train_file TRAIN_FILE [--eval_file EVAL_FILE]
                [--eval_size EVAL_SIZE] [--test_file TEST_FILE] --target
                TARGET [--pos_label POS_LABEL] [--max_trials MAX_TRIALS]
                [--model_output MODEL_OUTPUT]
                [--prediction_output PREDICTION_OUTPUT] [--searcher SEARCHER]
...

For example, train dataset blood.csv:

hypergbm --train_file=blood.csv --test_file=blood.csv --target=Class --pos_label=1 --model_output=model.pkl

Hypernets related projects

HyperGBM: A full pipeline AutoML tool integrated various GBM models.
HyperDT/DeepTables: An AutoDL tool for tabular data.
HyperKeras: An AutoDL tool for Neural Architecture Search and Hyperparameter Optimization on Tensorflow and Keras.
Cooka: Lightweight interactive AutoML system.
Hypernets: A general automated machine learning framework.

DataCanvas

HyperGBM is an open source project created by DataCanvas.

Name		Name	Last commit message	Last commit date
Latest commit History 329 Commits
docs		docs
hypergbm		hypergbm
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
HyperGBM-FAQ.ipynb		HyperGBM-FAQ.ipynb
LICENSE		LICENSE
README.md		README.md
README_zh_CN.md		README_zh_CN.md
requirements-tests.txt		requirements-tests.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyperGBM

What is HyperGBM

Overview

Tutorial

Installation

Examples

Hypernets related projects

DataCanvas

About

Releases

Packages

Languages

License

zhangxjohn/HyperGBM

Folders and files

Latest commit

History

Repository files navigation

HyperGBM

What is HyperGBM

Overview

Tutorial

Installation

Examples

Hypernets related projects

DataCanvas

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages