This project focuses on detecting cyberbullying using Natural Language Processing (NLP) techniques. Cyberbullying is a significant issue in the digital age, affecting individuals' mental health and well-being. The aim of this project is to develop a robust model that can identify and flag instances of cyberbullying from textual data.
The dataset used for this project is the Cyberbullying Tweets dataset available on Kaggle. It contains comments labeled as cyberbullying or not, providing a valuable resource for training and testing the detection model.
- Data Collection and Preprocessing: Collecting and preprocessing data to build a comprehensive dataset for training and testing.
- Textual Analysis: Utilizing NLP techniques such as tokenization, stemming, and lemmatization to analyze the textual data.
- Feature Extraction: Implementing feature extraction methods including TF-IDF, word embeddings, and sentiment analysis to capture the nuances of the text.
- Model Development: Developing and training machine learning models, such as Logistic Regression, Support Vector Machines (SVM) and Naive Bayes to detect cyberbullying.
- Evaluation: Evaluating the models using metrics such as accuracy, precision, recall, and F1-score to determine their effectiveness.
- Programming Languages: Python
- Libraries and Frameworks: NLTK, SpaCy, Scikit-Learn
- Tools: Jupyter Notebook, Pandas, NumPy, Matplotlib, Seaborn
A Project by Quah Seng Kit, Sattish Pratap Shewkani, Yeo Yee Tao