Semantic-Enhanced Crowdsourcing Study for Target Group Identification - Code

Data Repository:

Paper Preprint:

Semantic-Enhanced Crowdsourcing Study for Target Group Identification - Code

This is the source code to reproduce the paper: Enhancing Hate Speech Annotations with Background Semantics (ECAI 2024): https://oro.open.ac.uk/98676/

The Data repository is available in Open Research Data Online (ORDO).

Repo structure

The raw data is organised in the following folders:

Annotators: anonymised demographic tables from Prolific. Each participant appears in one file only, subject to being (i) heterosexual cis men (M_MH), (ii) heterosexual cis women (W_WH), or LGBTQ+ member because of their (iii) gender (trans, G_T, or non-binary, G_NB) or (iv) sexuality (non-heterosexual, S_H).
Data: contains semantic and crowdsourcing annotations. Crowdsourcing annotations were collected as shown in the example figure and full documentation.
Semantic_annotation: Jupyter notebooks to provide background knowledge to the hate speech sample using a knowledge graph, i.e., the GSSO (pruned_concepts.csv) and other linguistic resources (missing_concepts.csv).
Documentation: contains the approved Ethics Application Form and Participant Information Sheet.

Source code is in scripts, specifically in the Python files:

dataCollect.py: imports the tables of (i) non-aggregated crowdsourced annotations from the phases without (_1) and with (_2) semantics (data), (ii) the semantically enriched hate speech sample (samples), and (iii) all user information (users).
agreement.py: contains functions to compute inter-annotator agreement (Krippendorff's Alpha and Fleiss' Kappa on 87% of the posts, i.e., with 6 annotations).
helper.py: helper functions to analyse alignment (Pearson's correlation) and change after semantics (categorisation by agreement and decision made on target groups).
utils.py: functions for table plot (agreement and correlation, Figure 2), horizontal bar and Sankey diagram (frequency and shifts, Figure 3) and, heatmap (categories overlap, Figure 4).

All files used for evaluation in the paper are in folder results.

Run files

The code runs in Python version 3.12 using packages in requirements.txt:

    hateRep <user-login>$ python main.py

Phase 2 Annotation Example (with semantics)

There is a PDF showing the full annotation study with examples provided by participants.

Texts in Phase 2 were annotated as shown below:

In Phase 1, the same layout is presented but without underlined terms in the post and with an empty column on the left.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
annotators		annotators
data		data
documentation		documentation
results		results
scripts		scripts
semantic_annotation		semantic_annotation
supplemental-material		supplemental-material
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic-Enhanced Crowdsourcing Study for Target Group Identification - Code

Repo structure

Run files

Phase 2 Annotation Example (with semantics)

About

Releases 1

Languages

preyero/hateRep

Folders and files

Latest commit

History

Repository files navigation

Semantic-Enhanced Crowdsourcing Study for Target Group Identification - Code

Repo structure

Run files

Phase 2 Annotation Example (with semantics)

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages