Random Forest

Implementation of the random forest ensemble learning method for classification.

Installing the requirements

pip3 install -r requirements.txt

Decision tree

Consider the United States Congressional Voting Records dataset, in data/house_votes_84.tsv.

The idea is to construct a decision tree based on a training dataset. In this case, it creates a model to predict if a person is going to eat in a restaurant or not. An example of a constructed tree is given right below.

The float numbers are the information gains for each division.

And right below, there's another example of a constructed tree, using the Wine recognition dataset, in data/wine_recognition.tsv, that has attributes with continuous values:

Running experiments

To view some of the resultant decision trees considering the datasets that are in the data folder, run the folowing command:

python3 main.py

Random Forest

The random forest implementation used the decision tree implementation presented earlier. For the construction of each tree of the forest, we perform a random sampling with replacement of the training data. This process is know as Bootstrapping.

Moreover, on each node division during the decision tree construction, a random subset of attributes are selected. The of attributes are:

$\lfloor\sqrt{M}\rfloor$

Being M the number of features.

The ideia is to reduce the variance as the number of trees used grows. This leads to more bias.

k-Fold Cross Validation

To validate the trained forests, we perform a cross validation. In this case, a stratified sampling is perform, one for each fold that needs to be constructed.

In this case, when performing the validation, we iterate through each of of the folds, using the current one as the test dataset, and all the other ones, to train a forest. For each case, the accuracy of the predictions are calculated.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
data		data
results/trees		results/trees
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cross_validation.py		cross_validation.py
decision_tree.py		decision_tree.py
main.py		main.py
random_forest.py		random_forest.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Random Forest

Installing the requirements

Decision tree

Running experiments

Random Forest

k-Fold Cross Validation

About

Languages

License

rafaelvargas/random-forest

Folders and files

Latest commit

History

Repository files navigation

Random Forest

Installing the requirements

Decision tree

Running experiments

Random Forest

k-Fold Cross Validation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages