The Message Spam Detection project is designed to identify and classify messages as either spam or non-spam using machine learning techniques. The project utilizes the Naive Bayes classifier for its effectiveness in text classification tasks. Additionally, a web interface is developed using Flask to provide users with a seamless experience for interacting with the spam detection system.
The dataset used for this project is the SMS Spam Collection dataset from the UCI Machine Learning Repository. The dataset contains 5,574 messages, of which 4,827 are non-spam and 747 are spam. The messages are labeled as either spam or non-spam, and the dataset is split into a training set and a test set.
The preprocessing of the dataset involves the following steps:
- Tokenization: Splitting the messages into individual words
- Removing stop words: Eliminating common words that do not provide meaningful information
- Stemming: Reducing words to their root form
- Vectorization: Converting the messages into numerical vectors
The Naive Bayes classifier is used to classify the messages as spam or non-spam. The model is trained on the training set and evaluated on the test set. The performance of the model is measured using metrics such as accuracy, precision, recall, and F1 score.
For detailed code, refer to the following link:
https://github.com/Abhigyann-Singh/Message-Phishing-ML-Detection/blob/main/Mlmodels/spamsms.ipynb
The web interface is developed using Flask, a Python web framework. The interface allows users to input a message and receive a prediction of whether the message is spam or non-spam. The interface also provides visualizations of the model's performance metrics.
The Message Spam Detection project demonstrates the effectiveness of machine learning techniques in classifying messages as spam or non-spam. The Naive Bayes classifier achieves high accuracy in identifying spam messages, and the web interface provides a user-friendly experience for interacting with the spam detection system.
- UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/sms+spam+collection
- Flask: https://flask.palletsprojects.com/en/2.0.x/
- Scikit-learn: https://scikit-learn.org/stable/
- Matplotlib: https://matplotlib.org/
- Pandas: https://pandas.pydata.org/
- NLTK: https://www.nltk.org/
- NumPy: https://numpy.org/
- Seaborn: https://seaborn.pydata.org/
Nalin Angrish
LinkedIn: https://www.linkedin.com/in/nalin-angrish-7b5b3b1b3/
Abhigyan Singh