organised by Vancouver School of AI
Date: 5 November 2018
Build a classification model that can distinguish between toxic and non-toxic comments and use the model in a real-life application.
The meetups serve as guidance. The goal is for all attendees to build a good machine learning model that can be used in a real-life application. We encourage all attendees to apply creativity to this project. There are no limits.
All code is written in Python. Please use this guide to get Python and Jupyter Notebook up and running.
The project uses data from Kaggle's Toxic Comment Classification Challenge. The data can be found here.
If you are struggling with implementing some of the concepts discussed at the meetup, check out the notebook in this repo as guidance. There are also many kernels specific to the toxic comment challenge that you can refer to get some inspiration or help.
Alternatively, ask for assistance on Slack. That's what this community is all about :)
Due Date: Sunday, 18 November 2018 (PT)
Challenge: Perform an exploratory data analysis (EDA) on the Toxic Comment Classification Challenge data. After gaining a better understanding of the data, apply some of the text preprocessing techniques to the comments.
Everyone is encouraged to participate! Whatever you do for this code challenge will be used as a starting point for your classification model (which will be discussed at the next meetup).
The winning solution should ideally contain:
-
a well-documented EDA
-
data preprocessing techniques that are sensible
To submit, post your submission's repository link on the # code_challenge
Slack channel (on the Vancouver School of AI workspace) before the due date.