Skip to content

Udacity-MachineLearning-Internship/More-Spam-Classifying

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

More_Spam_Classifying

GitHub repo size GitHub repo file count (file type) Python Version Pip Version GitHub last commit (branch) Version Contributors GitHub pull requests

This repository contains an Implementing more spam classifying using Ensemble Methods in python.

Overview

Ensemble Methods

In order to find a way to optimize for both variance and bias, we have ensemble methods. Ensemble methods have become some of the most popular methods used to compete in competitions on Kaggle and used in industry across applications.

There were two randomization techniques you saw to combat overfitting:

  • Bootstrap the data - that is, sampling the data with replacement and fitting your algorithm and fitting your algorithm to the sampled data.
  • Subset the features - in each split of a decision tree or with each algorithm used an ensemble only a subset of the total possible features are used.

Contents

  • Spam_&_Ensembles.ipynb: Jupyter Notebook containing the implementation of SVM's using Python.
  • README.md: This file providing an overview of the repository.

Requirements

To run the code in the Jupyter Notebook, you need to have Python installed on your system along with the following libraries:

  • NumPy
  • pandas
  • scikit-learn

You can install these libraries using pip:

pip install numpy pandas scikit-learn

Usage

  1. Clone this repository to your local machine:
git clone https://github.com/BaraSedih11/More-Spam-Classifying.git
  1. Navigate to the repository directory:
cd More-Spam-Classifying
  1. Open and run the Jupyter Notebook Spam_&_Ensembles.ipynb using Jupyter Notebook or JupyterLab.

  2. Follow along with the code and comments in the notebook to understand how Ensemble methods is implemented using Python.

Acknowledgements

  • scikit-learn: The scikit-learn library for machine learning in Python.
  • NumPy: The NumPy library for numerical computing in Python.
  • pandas: The pandas library for data manipulation and analysis in Python.

Releases

No releases published

Packages

No packages published