This repository is the official implementation of our ACL'24 Findings paper Contrastive Instruction Tuning.
Experiments are run in the following environment:
Package | Version |
---|---|
conda | 22.9.0 |
Python | 3.8 |
CUDA | 11.8 |
conda create -n coin python=3.8
conda activate coin
pip install -r requirements.txt
The original data source of our new dataset is the FLAN collection, specifically from Muennighoff/flan on Huggingface. The model we used is Alpaca trained with LoRA. We used code from Alpaca-LoRA as a starting point and added our implementations. We follow the steps discussed in section 3.2 of the paper to curate dataset for CoIN, which is available here.
- Each entry contains:
- The original instruction-input pair (
original_instruction
) - The paraphrased instruction-input pair (
paraphrased_instruction
) - Label (
targets
) - Task name
- Keyword data (a dictionary that contains KV pairs that will be parsed into the instruction templates to get the full input).
- The original instruction-input pair (
- Instruction templates are available here.
- Every entry at the odd index is the hard negative for the entry above it.
Parameters are defined in run_contrastive.sh. Check ContrastiveLlamaTrainingArgument
in run_contrastive_llama.py for more details regarding default values of all parameters.
- To start training the CoIN model, please run the following:
bash scripts/run_contrastive.sh
- To run the continually instruction-tuned model (training with data augmentation only), change
do_contrastive
to FALSE.
In this project, we follow PromptBench to add perturbations to instructions. All perturbed instructions for 10 GLUE tasks are available here. To evaluate a model, please:
- Go to eval_contrastive.sh
- Change
checkpoint_dir
to the path of your checkpoint/output directory - Run:
bash scripts/eval_contrastive.sh
- You can change
perturb_method
andpromptbench_eval_task
to evaluate the model on different perturbation methods and evaluation tasks. Supported perturbation methods and tasks are available in the bash script andUnseenInstructionEvalArgs
in run_contrastive_llama.py.
To obtain average accuracy(exact match) and standard deviation of the model on the perturbed instructions for each task, please run:
python promptbench/postprocessing.py --output_dir "YOUR_OUTPUT_DIR"
- The evaluation script will store model's outputs to the directory named
preds
under your model's checkpoint directory. - Substitute
YOUR_OUTPUT_DIR
with the path where the outputs are stored (e.g.output/CoIN/preds
). - The script will produce a csv file named
unseen_instruction_acc.csv
underYOUR_OUTPUT_DIR
.
@inproceedings{yan2024contrastive,
title={Contrastive Instruction Tuning},
author={Yan, Lorena Tianyi and Wang, Fei and Huang, James Y and Zhou, Wenxuan and Yin, Fan and Galstyan, Aram and Yin, Wenpeng and Chen, Muhao},
booktitle={ACL - Findings},
year={2024}
}