NLP-Toxic-comment-classification

This is the repository for the NLP project in the context of IASD and MASH masters, PSL university : toxic comment classification. The dataset, found on Kaggle, is made of comments, classified into classes : toxic, severe toxic, obscene, threat, insult, identity_hate and regular comments.

In this work, we tested several methods :

Apply a Bag Of Words (BOW) + Logistic Regression classifier
Use GPT2 to obtain embeddings from words in the comments, that we stack in a padded matrix. Then, apply a MLP or a RandomForest on this matrix
Extend the Multi-Level Graph Neural Network (slightly modified) to a multi-class text classification, on large sentences.

The notebook Dataset_stats.ipynb contains several statistics per class (average comment length per class, average number of ! per comment per class). The notebook BOW.ipynb illustrates a Bag Of Words + Logistic Regression classifier on the dataset. The notebook MLP&RandomForest.ipynb uses a MLP and a RandomForest on a matrix obtained with GPT2 embeddings. Finally, the MLGNN.ipynb extends the MLGNN framework to our multi-class text classification problem, with long sentences.

The pdf file report.pdf analyzes our work.

Our work can be reproduced by using the standalone notebooks. One only requires to download the dataset train.csv at the corresponding url : https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
BOW.ipynb		BOW.ipynb
Dataset_stats.ipynb		Dataset_stats.ipynb
MLGNN.ipynb		MLGNN.ipynb
MLP&LSTM.ipynb		MLP&LSTM.ipynb
README.md		README.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-Toxic-comment-classification

About

Releases

Packages

Contributors 3

Languages

rc-94/NLP-Toxic-comment-classification

Folders and files

Latest commit

History

Repository files navigation

NLP-Toxic-comment-classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages