Replication of Comparative Studies of Detecting Abusive Language on Twitter

This is a repository for the replication project for the Winter 2020 Data Reproducibility course in the Master of Data Science program at University of Washington.

CONTRIBUTORS

Lately, there has been a lot of effort and research on identifying content that is abusive or offensive on online and social media. Twitter recently published a relatively large and reliable dataset on â€˜Hate and Abusive Speech on Twitterâ€™. As Data Scientists, we understand the need to find the best methods and data for identifying such content and flagging it as inappropriate.

In this repository, our aim is to replicate some of the findings in a research paper that performs a comparative study and provides suggestions for using additional features and data for improving such classification of hate and abusive speech using Twitter data. Using the data and code provided by the authors, we aim to replicate the efficacy and accuracy of Logistic Regression model presented in this paper. The original paper had a comparative study of 5 different machine learning and deep learning algorithms. However, for our replication purpose we chose Logistic Regression model using word-level features as the authors have stated that this model outperformed all the machine learning techniques and had an F1-score which was equivalent to the best CNN model. For our project, we also had limited computational resources due to which execution of other machine learning and deep learning models was out of scope.

Citation: Lee, Y., Yoon, S., & Jung, K. (2018). Comparative studies of detecting abusive language on twitter. arXiv preprint arXiv:1808.10245.

URL: https://arxiv.org/abs/1808.10245

Git Repository: https://github.com/younggns/comparative-abusive-lang/blob/master/README.md

DATA

All data files required for our replication project can be found in the 'data' directory in this repository. URL: https://github.com/UW-MSDS-DATA-598-Reproducibility-WI20/goel-modi-moroney-ramprasad-replication-project/tree/master/Data

This directory contains all details about the original data which was used by the authors of the research as well as the data which was sampled and processed for this replication study. Please refer the README.md in the data directory for additional details.

ANALYSIS

The analysis directory contains the R Markdown report detailing the procedure and results of this replication study. This directory also contains the intermediate outputs, R scripts, data and images required to Knit the R Markdown report file successfully. For additional details, please refer the README.md in this directory. URL: https://github.com/UW-MSDS-DATA-598-Reproducibility-WI20/goel-modi-moroney-ramprasad-replication-project/tree/master/analysis

DEPENDENCIES

OS type and version: Windows 10 Pro, Version 1903, OS build 18362.535

System type: 64-bit OS, x64-based processor

R version: >=3.6.2

R packages and versions:

R Package	Version
CARET	6.0-84
future	1.16.0
tm	0.7-7
quanteda	1.5.2
Liblinear	2.10-8
stringr	1.4.0
here	0.1
ggplot2	3.2.1
wordcloud	2.6
bookdown	0.17
dplyr	0.8.3
knitr	1.28

LICENSE

The project is licensed as MIT. Please read our license details.

Text and Figures : MIT + file LICENSE Code : MIT + file LICENSE Data : MIT + file LICENSE

CONTRIBUTING

We welcome contributions from everyone. If you would like to make a contribution, please read our contributor guidelines. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.binder		.binder
.github		.github
Data		Data
analysis		analysis
notes		notes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
Reproducibility-Research-Replication.Rproj		Reproducibility-Research-Replication.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication of Comparative Studies of Detecting Abusive Language on Twitter

CONTRIBUTORS

CONTENTS

DATA

ANALYSIS

DEPENDENCIES

LICENSE

CONTRIBUTING

About

Releases

Packages

Contributors 5

Languages

License

UW-MSDS-DATA-598-Reproducibility-WI20/goel-modi-moroney-ramprasad-replication-project

Folders and files

Latest commit

History

Repository files navigation

Replication of Comparative Studies of Detecting Abusive Language on Twitter

CONTRIBUTORS

CONTENTS

DATA

ANALYSIS

DEPENDENCIES

LICENSE

CONTRIBUTING

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages