Code for our paper Table-based Fact Verification with Salience-aware Learning at EMNLP 2021 Findings.
pip install -r requirements.txt
Install pytorch_scatter.
We conduct experiments on the TabFact dataset. The statements in officially released train/val/test set are lemmatized. We use the raw (unlemmatized) statements. More discussion can be found in this issue.
Download the train/val/test set to ./data
.
Download the table set to ./data/tables
.
To convert raw data to model inputs:
cd data
python preprocess.py
cd token_salience
- First, run
bash run_origin.sh
to get predictions for original inputs. - Second, run
bash run_masked.sh
to get predictions for inputs with masked tokens. - Third, run
python calculate_salience.py
to get salience scores by comparing the outputs of last two steps. - Finally, run
python add_salience_to_data.py
to merge the salience scores into input data.
cd token_replacement
- First, run
bash run_mlm.sh
to get predictions for replacing non-salient tokens. - Second, run
python add_token_replacement.py
to merge the token replacement candidates into input data.
cd joint_model
bash run_joint_model.sh
@inproceedings{wang-etal-2021-table-based,
title = "Table-based Fact Verification With Salience-aware Learning",
author = "Wang, Fei and
Sun, Kexuan and
Pujara, Jay and
Szekely, Pedro and
Chen, Muhao",
booktitle = "EMNLP - findings",
year = "2021",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-emnlp.338",
pages = "4025--4036"
}