MT-Pref

This repository contains the code for our upcoming EMNLP paper: Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation.

Abstract

Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the other hand, can induce preferences, but they might not match human expectations perfectly. In this paper, we propose an approach that leverages the best of both worlds. We first collect sentence-level quality assessments from professional linguists on translations generated by multiple high-quality MT systems and evaluate the ability of current automatic metrics to recover these preferences. We then use this analysis to curate a new dataset, MT-Pref (metric induced translation preference) dataset, which comprises 18k instances covering 18 language directions, using texts sourced from multiple domains post-2022. We show that aligning TOWER models on MT-Pref significantly improves translation quality on WMT23 and FLORES benchmarks.

Data and Code

The MT-Pref dataset is available here: sardinelab/MT-pref. We release the raw unfiltered translation outputs from all models with scores from multiple metrics. We additionally also include all evaluations conducted on the trained models on WMT23 and FLORES datasets under TowerEval-Data-v0.1 for reproducibility.

Installation

Install the required libraries:

pip install --upgrade pip
pip install trl==0.10.1 deepspeed accelerate peft wandb evaluate sacrebleu unbabel-comet vllm

Then install flash-attn as follows:

pip install wheel packaging setuptools

Finally, install the other required libraries:

pip install flash-attn --no-build-isolation

To run evaluation using tower-eval, install the library as detailed here.

Training

All results in the paper can be reproduced by running:

bash train_configs.sh

Evaluation

python -m tower_eval.cli gen-eval --config configs/eval_configs.yaml

Citation

TBA

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
TowerEval-Data-v0.1		TowerEval-Data-v0.1
configs		configs
data		data
human_evaluation		human_evaluation
notebooks		notebooks
README.md		README.md
compute_accuracy.py		compute_accuracy.py
data_utils.py		data_utils.py
finetune.py		finetune.py
train_configs.sh		train_configs.sh
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MT-Pref

Abstract

Data and Code

Installation

Training

Evaluation

Citation

About

Releases

Packages

Languages

deep-spin/mt-pref-alignment

Folders and files

Latest commit

History

Repository files navigation

MT-Pref

Abstract

Data and Code

Installation

Training

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages