Skip to content

marian-nmt/wmt23-metrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WMT 2023 Metrics Shared Task Submission: Cometoid

This repository contains documentation and scripts for Cometoid and Chrfoid metrics.

Our models are implemented in Marian NMT. We provide python bindings for Marian, which can be installed as follows:

Prerequisites: Install PyMarian
git clone https://github.com/marian-nmt/marian-dev
# Pybind PR (github.com/marian-nmt/marian-dev/pull/1013) not merged at the time of writing;
# so checkout the branch
git checkout tg/pybind-new

# build and install -- run this from project root, i.e., dir having pyproject.toml
# simple case : no cuda, default compiler
pip install -v .

# optiona: using a specific version of compiler (e.g., gcc-9 g++-9)
CMAKE_ARGS="-DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9" pip install -v .

# Option: with CUDA on
CMAKE_ARGS="-DCOMPILE_CUDA=ON" pip install .

# Option: with a specific version of cuda toolkit, e,g. cuda 11.5
CMAKE_ARGS="-DCOMPILE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.5" pip install -v .
Tip
If CMAKE complains about missing C/C++ libraries, follow instructions at https://marian-nmt.github.io/quickstart/ and install them.

The following reference-free/quality-estimation metrics are available:

  • chrfoid-wmt23

  • cometoid22-wmt23

  • cometoid22-wmt22

  • cometoid22-wmt21

pymarian-evaluate is a convenient CLI that downloads and caches models, and runs them on the input data.

# download sample data
sacrebleu -t wmt21/systems -l en-zh --echo src > src.txt
sacrebleu -t wmt21/systems -l en-zh --echo Online-B > mt.txt

# chrfoid
paste src.txt mt.txt | pymarian-evaluate --stdin -m chrfoid-wmt23
# cometoid22-wmt22; similarly for other models
paste src.txt mt.txt | pymarian-evaluate --stdin -m cometoid22-wmt22
# Evaluator
from pymarian import Evaluator
marian_args = '-m path/to/model.npz -v path/to.vocab.spm path/to.vocab.spm --like comet-qe'
evaluator = Evaluator(marian_args)

data = [
    ["Source1", "Hyp1"],
    ["Source2", "Hyp2"]
]
scores = evaluator.run(data)
for score in scores:
    print(score)
Tip
The pymarian-evaluate downloads and caches models under ~/.cache/marian/metrics/<model>

Models can ne downloaded from https://textmt.blob.core.windows.net/www/models/mt-metric/<model>.tgz, where <model> is one of the above (eg. cometoid22-wmt22).

model_id=marian-nmt/cometoid22-wmt23
model=$(huggingface-cli download $model_id checkpoints/marian.model.bin)
vocab=$(huggingface-cli download $model_id vocab.spm)

# Score on CPU
N_CPU=6
paste src.txt mt.txt | marian evaluate --like comet-qe -m $model -v $vocab $vocab \
        -w 8000 --quiet --cpu-threads $N_CPU --mini-batch 1 --maxi-batch 1

# Score on GPUs; here "--devices 0 1 2 3" means use 4 GPUs
paste src.txt mt.txt | marian evaluate --like comet-qe -m $model -v $vocab $vocab \
        -w -4000 --quiet --devices 0 1 2 3 --mini-batch 16 --maxi-batch 1000
Step 1: Setup and obtain teset set
# install requirements; including mt-metrics-eval
pip install -r requirements.txt

# download metrics package
python -m mt_metrics_eval.mtme --download


# Flatten test data as TSV; this will be used in subsequent steps
# also this will help speedup evaluation
data_file=$(python -m evaluate -t wmt22 flatten --no-ref)
Step 2: Get Segment Level Scores
devices="0 1 2 3 4 5" # run on GPUs
n_segs=$(wc -l < $data_file)
for m in chrfoid-wmt23 cometoid22-wmt{21,22,23}; do
    out=${data_file%.tsv}.seg.score.$m;
    [[ -f $out._OK ]] && continue;
    rm -f $out; `# remove incomplete file, if any`
    cut -f4,6 $data_file | pymarian-evaluate --stdin -d $devices -m $m | tqdm --desc=$m --total=$n_segs > $out && touch $out._OK;
done
Step 3: Evaluate on WMT22 Metrics testset
# average seg scores to system scores; then evaluate
for m in chrfoid-wmt23 cometoid22-wmt{21,22,23}; do
    score_file=${data_file%.tsv}.seg.score.$m;
    [[ -f $score_file._OK ]] || { echo "Error: $scores_file._OK  not found"; continue;}
    python -m evaluate -t wmt22 full --no-ref --name $m --scores $score_file;
done

This produces results.csv

Step 4: Verify results
$ cat results.csv  | grep -E -i 'wmt22|cometoid|chrf|comet-22'
,wmt22.mqm_tab11,wmt22.da_sqm_tab8
*chrfoid-wmt23[noref],0.7773722627737226,0.8321299638989169
*cometoid22-wmt21[noref],0.7883211678832117,0.8483754512635379
*cometoid22-wmt22[noref],0.8065693430656934,0.8574007220216606
*cometoid22-wmt23[noref],0.8029197080291971,0.8592057761732852
COMET-22,0.8394160583941606,0.8393501805054152
MS-COMET-22,0.8284671532846716,0.8303249097472925
chrF,0.7335766423357665,0.7581227436823105

Please cite this paper: https://aclanthology.org/2023.wmt-1.62/

@inproceedings{gowda-etal-2023-cometoid,
    title = "Cometoid: Distilling Strong Reference-based Machine Translation Metrics into {E}ven Stronger Quality Estimation Metrics",
    author = "Gowda, Thamme  and
      Kocmi, Tom  and
      Junczys-Dowmunt, Marcin",
    editor = "Koehn, Philipp  and
      Haddon, Barry  and
      Kocmi, Tom  and
      Monz, Christof",
    booktitle = "Proceedings of the Eighth Conference on Machine Translation",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.wmt-1.62",
    pages = "751--755",
}

Releases

No releases published

Packages

No packages published