We put together a script, data, and trained models used in our paper. In a nutshell, TANDA is a technique for fine-tuning pre-trained Transformer models sequentially in two steps:
- first, transfer a pre-trained model to a model for a general task by fine-tuning it on a large and high-quality dataset;
- then, perform a second fine-tuning step to adapt the transferred model to the target domain.
We base our implementation on the transformers package. We use the following script to enable sequential fine-tuning
option for the package.
git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout f3386 -b tanda-sequential-finetuning
git apply tanda-sequential-finetuning-with-asnq.diff
f3386
is the latest commit as ofSun Nov 17 18:08:51 2019 +0900
, andtanda-sequential-finetuning-with-asnq.diff
is the diff to enable the option.
For example, to transfer with ASNQ and adapt with a target dataset:
- download the ASNQ dataset and the target dataset (e.g. Wiki-QA, formatted similar as ASNQ), and
- run the following script
python run_glue.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--task_name ASNQ \
--do_train \
--do_eval \
--do_lower_case \
--data_dir [PATH-TO-ASNQ] \
--per_gpu_train_batch_size 150 \
--learning_rate 2e-5 \
--num_train_epochs 2.0 \
--output_dir [PATH-TO-TRANSFER-FOLDER]
python run_glue.py \
--model_type bert \
--model_name_or_path [PATH-TO-TRANSFER-FOLDER] \
--task_name ASNQ \
--do_train \
--do_eval \
--sequential \
--do_lower_case \
--data_dir [PATH-TO-WIKI-QA] \
--per_gpu_train_batch_size 150 \
--learning_rate 1e-6 \
--num_train_epochs 2.0 \
--output_dir [PATH-TO-OUTPUT-FOLDER]
We use the following datasets in the paper:
- ASNQ is a dataset for answer sentence selection derived from Google Natural Questions (NQ) dataset (Kwiatkowski et al. 2019). The dataset details can be found in our paper.
- ASNQ is used to transfer the pre-trained models in the paper, and can be downloaded here.
- ASNQ-Dev++ can be downloaded here.
- Wiki-QA: we used the Wiki-QA dataset from here and removed all the questions that have no correct answers.
- TREC-QA: we used the
*-filtered.jsonl
version of this dataset from here.
- TANDA: BERT-Base ASNQ → Wiki-QA
- TANDA: BERT-Large ASNQ → Wiki-QA
- TANDA: RoBERTa-Large ASNQ → Wiki-QA
- TANDA: BERT-Base ASNQ → TREC-QA
- TANDA: BERT-Large ASNQ → TREC-QA
- TANDA: RoBERTa-Large ASNQ → TREC-QA
The paper appeared in the AAAI 2020 proceedings. Please cite our work if you find our paper, dataset, pretrained models or code useful:
@article{Garg_2020,
title={TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection},
volume={34},
ISSN={2159-5399},
url={http://dx.doi.org/10.1609/AAAI.V34I05.6282},
DOI={10.1609/aaai.v34i05.6282},
number={05},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
publisher={Association for the Advancement of Artificial Intelligence (AAAI)},
author={Garg, Siddhant and Vu, Thuy and Moschitti, Alessandro},
year={2020},
month={Apr},
pages={7780–7788}
}
The documentation, including the shared data and models, is made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. See the LICENSE file.
The sample script within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.
For help or issues, please submit a GitHub issue.
For direct communication, please contact Siddhant Garg (https://github.com/sid7954), Thuy Vu (thuyvu is at amazon dot com), or Alessandro Moschitti (amosch is at amazon dot com).