The dataset contains 397 instances, processed as minimal pairs with a prefix, a good and a bad continuation, based on plausibility of the resulting phrase.
For example:
- The actor won the {award=good, battle=bad}
The code, data and documentation inside this folder was forked from Hu, Jennifer & Levy, Roger (2023). Please refer to their original repository for additional information.
The content of the original README.md is attached below.
The original file clean_DTFit_human_dat.csv
was downloaded from here on March 22, 2023.
Please see Kauf, Ivanova et al. (2022) for more details.
Here is a description from their paper about human scores:
Human judgments for Dataset 2 had been previously collected by Vassallo et al. (2018) on Prolific, a web-based platform for collecting behavioral data. Participants in this experiment answered questions of the form “How common is it for an actor to win an award?” on a Likert scale from 1 (very atypical) to 7 (very typical).
The data are originally from Vassallo et al. (2018).
The final corpus.csv
file was created by running the script python make_corpus.py
.