organised by Vancouver School of AI
Date: 4 December 2018
Build a classification model that can distinguish between toxic and non-toxic comments and use the model in a real-life application.
The meetups serve as guidance. The goal is for all attendees to build a good machine learning model that can be used in a real-life application. We encourage all attendees to apply creativity to this project. There are no limits.
All code is written in Python. Please use this guide to get Python and Jupyter Notebook up and running.
This project contains a Flask Web App and Keras NLP model files trained to identify levels of toxicity in comments.
It is deployed on Heroku Heroku.
The Deployment instructions below will help you in deploying it as your own Web App on Heroku.
- Python: 3.6
- Flask: 1.0.2
- Keras: 2.2.4
- pandas: 0.23.4
- numpy: 1.15.4
- sklearn: 0.20.1
For those who want to walk though the whole process from training to deployment, you need to download the data to train the model
To download the data, run:
python ml_model/download.py
This will download the training data and pre-trained embedding file in :
- ./assets/data/train.csv
- ./assets/embedding/fasttext-crawl-300d-2m/crawl-300d-2M.vec
To train the model, run:
python ml_model/train_classifier.py
This will train a pooled GRU with FastText embedding. The text preprocessor and the model will be seriallized and stored in:
- ./assets/model/preprocessor.pkl
- ./assets/model/model.h5 Note: This took ~1 hour to train on Intel Core i7-HQ CPU
To get a feeling of doing predictions, run:
python ml_model/predict.py
# output:
# Corgi is stupid - Toxicity: [0.99293655]
# good boy - Toxicity: [0.02075008]
# School of AI is awesome - Toxicity: [0.01223523]
# F**K - Toxicity: [0.90747666]
- Create a Heroku account if you don't already have one.
- Install Heroku CLI.
- Fork this Github repo.
- Clone the forked repo to your local machine.
- Navigate to your cloned repo directory.
- Create your Heroku App via
heroku create <app-name>
- Deploy your App via
git push heroku master
- Run
heroku open
to open your newly deployed web app on your Web Browser.
The project uses data from Kaggle's Toxic Comment Classification Challenge. The data can be found here.
If you are struggling with implementing some of the concepts discussed at the meetup, check out the slides notebook as guidance. There are also many kernels specific to the toxic comment challenge that you can refer to get some inspiration or help.
Alternatively, ask for assistance on Slack. That's what this community is all about :)
Other Resources: