Biased Random Forest

This project provides a basic implementation of a Biased Random Forest (BRAF). BRAF is an ensemble method that seeks to mitigate issues caused by class imbalances in classification problems by "adequately representing the minority class during classification." This method was proposed by M. Bader-El-Den et al in "Biased Random Forest For Dealing With the Class Imbalance Problem."

Note, this work is for learning and POC purposes only. The implementation needs work to be production worthy.

Dataset

Pima Indians Diabetes Database: "The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome. Independent variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on." (Reference: Kaggle)

Setup

Python 3.6+
virtualenv (pip install virtualenv)
virtualenvwrapper (pip install virtualenvwrapper)

Install Required Libraries

mkvirtualenv braf (if python 3.X+ is your default python version) OR
Type which python3, to get the path of your python3 (i.e. /usr/local/bin/python)
mkvirtualenv -p [Path To Python3] braf
pip install -r requirements.txt

Possible Improvements

There are a number of improvements that could be made to this implementation including:

Try different feature scaling technique (i.e. mean normalization). Currently it uses the default min/max approach.
Try out different cost functions vs Gini Index
Try learning the optimal hyperparameters via a means like GridSearch
Implement unit tests
Refactor to use more object oriented approach for modeling the trees, though we may lose some performance here.
Refactor for general run time / space complexity tuning
Implement taking in user specified hyperparameters as cmd arguments.
Add more model performance evaluation metrics (i.e. AUPRC and AUROC curves)
Refactor to save trained models to disk and implement functionality that allows callers to run predictions against the BRAF model.

Quick EDA Analysis

From root of the project directory, run: python run_pima_exploratory_data_analysis.py. This outputs an pandas profiler generated html file to the /eda_output directory.

Training & Model Evaluation

From root of the project directory, run: python train.py. After training is complete it logs the mean accuracy, test precision and test recall metrics to standard out.

Local Training Performance Metrics Sample (2-Folds, Forest Size: 100, K-Neighbors=100, Critial Areas Ratio: 0.5)

Test Precision: 0.83
Test Recall: 0.92

Note, more work is needed to make training and evaluation processes more effienct and more tuning is needed as well.

References

M. Bader-El-Den, E. Teitei and T. Perry, "Biased Random Forest For Dealing With the Class Imbalance Problem," in IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 7, pp. 2163-2172, July 2019, doi: 10.1109/TNNLS.2018.2878400.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
biased_random_forest		biased_random_forest
data		data
eda_output		eda_output
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_pima_exploratory_data_analysis.py		run_pima_exploratory_data_analysis.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biased Random Forest

Dataset

Setup

Install Required Libraries

Possible Improvements

Quick EDA Analysis

Training & Model Evaluation

Local Training Performance Metrics Sample (2-Folds, Forest Size: 100, K-Neighbors=100, Critial Areas Ratio: 0.5)

References

About

Releases

Packages

Contributors 2

Languages

tbrooks007/Biased-Random-Forest

Folders and files

Latest commit

History

Repository files navigation

Biased Random Forest

Dataset

Setup

Install Required Libraries

Possible Improvements

Quick EDA Analysis

Training & Model Evaluation

Local Training Performance Metrics Sample (2-Folds, Forest Size: 100, K-Neighbors=100, Critial Areas Ratio: 0.5)

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages