Tracking Changes in #MeToo Movement with Deep Learning

Motivation

Social media provides a unique opportunity to shed light on phenomena that have previously been ignored or swept under the rug. The #MeToo movement began in 2006 when Tarana Burke first coined the phrase, but a viral Twitter post in 2017 served to lift the hashtag into the mainstream accompanied by the public reckoning of several high-profile men. This hashtag allows people to seek solidarity on a much larger scale in calling out the perpetrators of their abuse. We are going to categorize tweets with #MeToo into the years they were posted. In particular, we look at the years 2017, 2018, and 2019.

After compiling a classification model, we plan to use the LIME package to explain the pattern learned by the deep learning model. Such a process allows us to have a direct understanding of the difference the posts of the three years. Specifically, LIME provides us with a linear coefficient to every word that appears in one tweet, and tells us which one of them contributes most in categorizing them to one of three years. In using this approach, we can learn the difference by ourselves and explain them to others in language, which is far more better than a black box machine learning model. We list several observations that have a high possibility in each year, and show why our model categorizes it into such category. Some initial discussions are raised at the end of this report.

Data Description

Our raw data consists of tweets with the hashtag #MeToo from the years of 2017, 2018, and 2019 (links below). We take these datasets and modify them such that we keep only the year the tweet was created in, and the modified text of the tweet (removing hashtags, mentions, retweets, stopwords, emojis, hyperlinks, etc.). The preprocessing can be found in Preprocessing.ipynb, and additional helper functions can be found in custom_regex.py, modify_df.py, and sentence_processing.py.

Links to datasets

350K #MeToo Tweets (October 2017): https://data.world/rdeeds/350k-metoo-tweets
390K #MeToo Tweets (November 2017 - December 2017): https://data.world/balexturner/390-000-metoo-tweets
695K Hatred on Twitter During #MeToo Movement (September 2018 - February 2019): https://www.kaggle.com/rahulgoel1106/hatred-on-twitter-during-metoo-movement
15K #MeToo Tweets (October 2019): https://www.kaggle.com/mohamadalhasan/metoo-tweets-dataset

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
2017_lime		2017_lime
2018_lime		2018_lime
2019_lime		2019_lime
.gitattributes		.gitattributes
.gitignore		.gitignore
CNN.ipynb		CNN.ipynb
CNN_Lime.ipynb		CNN_Lime.ipynb
Exploration.ipynb		Exploration.ipynb
Modeling.ipynb		Modeling.ipynb
Preprocessing.ipynb		Preprocessing.ipynb
Project_F_Report_Group_3.pdf		Project_F_Report_Group_3.pdf
README.md		README.md
cnn_with_embedding.ipynb		cnn_with_embedding.ipynb
custom_regex.py		custom_regex.py
evaluation.py		evaluation.py
explore.py		explore.py
modify_df.py		modify_df.py
sentence_processing.py		sentence_processing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tracking Changes in #MeToo Movement with Deep Learning

Motivation

Data Description

Links to datasets

About

Releases

Packages

Contributors 2

Languages

shilpakancharla/metoo-classification

Folders and files

Latest commit

History

Repository files navigation

Tracking Changes in #MeToo Movement with Deep Learning

Motivation

Data Description

Links to datasets

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages