The exponential rise in SMS spam necessitates robust filtering techniques. This project aims to develop a machine learning classifier to categorize incoming SMS messages as spam or legitimate. The classifier will leverage Natural Language Processing (NLP) techniques, specifically Bag-of-Words (BOW) for message representation and Term Frequency-Inverse Document Frequency (TF-IDF) for feature weighting. By analyzing the word frequency patterns within messages and identifying terms that differentiate spam from legitimate messages, the classifier will learn to classify new SMS effectively, curbing spam and enhancing user experience.
The DataSet is taken from the UCI DataSet Machine learning Repo. Click for DESCRIPTION of DataSet.
The corpus has been collected by Tiago Agostinho de Almeida (http://www.dt.fee.unicamp.br/~tiago) and Jos� Mar�a G�mez Hidalgo (http://www.esp.uem.es/jmgomez).
Ensure you have the following dependencies installed:
- Python (version 3.9)
- Jupyter Notebook
- Other dependencies (refer to the requirements.txt)
You can install the required Python packages using:
pip install -r requirements.txt
- Clone the repository:
git clone https://github.com/SINGHxTUSHAR/BOW-TFIDF-spamBuster.git
cd BOW-TFIDF-spamBuster
- Create a virtual environment (optional but recommended):
python -m venv venv
- Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
If you'd like to contribute to this project, please follow the standard GitHub fork and pull request process. Contributions, issues, and feature requests are welcome!
If you have any suggestions for me related to this project, feel free to contact me at [email protected] or LinkedIn.
This project is licensed under the MIT License - see the LICENSE file for details.