Task description:

This repository contains annotated data on inappropriate language in online discussions, generated through a combination of expert annotation, crowd-sourcing, and ChatGPT-based methods.

annotations:

ChatGPT_explicit: This subfolder contains annotations of explicit inappropriate language identified by ChatGPT.
ExplicitlyInappropriateLanguageInContext: Here, you will find both crowd and expert annotations that highlight instances of explicitly inappropriate language.

codes:

Includes scripts and code used for data processing, analysis, etc.

data:

Holds the raw and processed data used for annotation and analysis. This includes input data in various formats and intermediate data sets generated during processing.

LingoTurk files:

Contains files related to the LingoTurk platform, which was used for collecting annotations. This includes task configurations and instructions.

statistics:

Includes statistical reports and summaries derived from the data set.

the analysis of annotations:

Contains detailed analyses of annotation results, including comparisons between different annotation methods, inter-annotator agreements, error analysis, and insights into annotation discrepancies.

Usage:

Researchers and developers interested in content moderation, natural language processing, and online discourse analysis can benefit from this data set and associated resources.

Citation:

If you use this data set or findings from this repository in your research or projects, please consider citing this repository and our paper.
Citing the paper: https://aclanthology.org/2024.trac-1.11/

@inproceedings{barbarestani-etal-2024-content, title = "Content Moderation in Online Platforms: A Study of Annotation Methods for Inappropriate Language", author = "Barbarestani, Baran and Maks, Isa and Vossen, Piek T.J.M.", editor = "Kumar, Ritesh and Ojha, Atul Kr. and Malmasi, Shervin and Chakravarthi, Bharathi Raja and Lahiri, Bornini and Singh, Siddharth and Ratan, Shyam", booktitle = "Proceedings of the Fourth Workshop on Threat, Aggression {&} Cyberbullying @ LREC-COLING-2024", month = may, year = "2024", address = "Torino, Italia", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.trac-1.11", pages = "96--104"}

Citing the repository: https://github.com/cltl/InappropriateLanguageDetection # Contact
Please feel free to ask any questions you may have by contacting me via b[dot]barbarestani[at]vu[dot]nl.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LICENSE.txt		LICENSE.txt
LingoTurk files.zip		LingoTurk files.zip
README.md		README.md
annotations.zip		annotations.zip
codes.zip		codes.zip
the analysis of annotations.zip		the analysis of annotations.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task description:

annotations:

codes:

data:

LingoTurk files:

statistics:

the analysis of annotations:

Usage:

Citation:

About

Releases

Packages

License

cltl/InappropriateLanguageDetection

Folders and files

Latest commit

History

Repository files navigation

Task description:

annotations:

codes:

data:

LingoTurk files:

statistics:

the analysis of annotations:

Usage:

Citation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages