In order to find a way to optimize for both variance and bias, we have ensemble methods. Ensemble methods have become some of the most popular methods used to compete in competitions on Kaggle and used in industry across applications.
There were two randomization techniques you saw to combat overfitting:
- Bootstrap the data - that is, sampling the data with replacement and fitting your algorithm and fitting your algorithm to the sampled data.
- Subset the features - in each split of a decision tree or with each algorithm used an ensemble only a subset of the total possible features are used.
Spam_&_Ensembles.ipynb
: Jupyter Notebook containing the implementation of SVM's using Python.README.md
: This file providing an overview of the repository.
To run the code in the Jupyter Notebook, you need to have Python installed on your system along with the following libraries:
- NumPy
- pandas
- scikit-learn
You can install these libraries using pip:
pip install numpy pandas scikit-learn
- Clone this repository to your local machine:
git clone https://github.com/BaraSedih11/More-Spam-Classifying.git
- Navigate to the repository directory:
cd More-Spam-Classifying
-
Open and run the Jupyter Notebook
Spam_&_Ensembles.ipynb
using Jupyter Notebook or JupyterLab. -
Follow along with the code and comments in the notebook to understand how Ensemble methods is implemented using Python.
- scikit-learn: The scikit-learn library for machine learning in Python.
- NumPy: The NumPy library for numerical computing in Python.
- pandas: The pandas library for data manipulation and analysis in Python.