VariErr NLI: Separating Annotation Error from Human Label Variation

This repo contains the data and code introduced in the following ACL 2024 paper:

Contributions

This repo introduces a systematic methodology and a new dataset, VariErr (variation versus error), focusing on the NLI task in English.
VariErr includes a 2-round annotation procedure with annotators explaining each label and subsequently judging the validity of label-explanation pairs.
VariErr contains 7,732 validity judgments on 1,933 explanations for 500 re-annotated MNLI items from ChaosNLI.
We include in this repo codes to assess the effectiveness of various automatic error detection (AED) methods and GPTs in uncovering errors versus human label variation.

VariErr Procedure

VariErr Examples

Data

We release our 2-round NLI annotations in varierr.json.
Dataset is also available at https://huggingface.co/datasets/mainlp/varierr.

Code

Please follow these steps to reproduce results in the paper:

Pip install required packages.

pip install -r requirements.txt

Run the run_experiments.sh bash script to train sequence classification and predict supervised/scorers/llms/baselines results.

bash run_experiments.sh

Main result results/results.tsv and additional ones are in results/.
Predictions are saved in predictions/.

Reference

Please cite the following paper:

@inproceedings{weber-2024-varierr,
      title={{VariErr NLI: Separating Annotation Error from Human Label Variation}}, 
      author={Leon Weber-Genzel and Siyao Peng and Marie-Catherine de Marneffe and Barbara Plank},
      year={2024},
      booktitle={ACL 2024},
}

Acknowledgement

This work is funded by ERC Consolidator Grant DIALECT 101043235 and supported by project KLIMA-MEMES funded by the Bavarian Research Institute for Digital Transformation (bidt), an institute of the Bavarian Academy of Sciences and Humanities.
The authors are responsible for the content of this publication.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figs		figs
predictions		predictions
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
build_score_table.py		build_score_table.py
eval_llms.py		eval_llms.py
predict_baselines.py		predict_baselines.py
predict_llms.py		predict_llms.py
predict_scorers.py		predict_scorers.py
predict_supervised.py		predict_supervised.py
requirements.txt		requirements.txt
run_experiments.sh		run_experiments.sh
sequence_classification_train.py		sequence_classification_train.py
varierr.json		varierr.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VariErr NLI: Separating Annotation Error from Human Label Variation

Contributions

VariErr Procedure

VariErr Examples

Data

Code

Please follow these steps to reproduce results in the paper:

Reference

Please cite the following paper:

Acknowledgement

About

Releases

Packages

Languages

mainlp/VariErr-NLI

Folders and files

Latest commit

History

Repository files navigation

VariErr NLI: Separating Annotation Error from Human Label Variation

Contributions

VariErr Procedure

VariErr Examples

Data

Code

Please follow these steps to reproduce results in the paper:

Reference

Please cite the following paper:

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages