Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Deprecation notice for get_ende_bleu.sh #1827

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

kpu
Copy link

@kpu kpu commented Jul 7, 2020

This script is harmful because it propagates a non-standard way to compute BLEU that is not reflective of the WMT 2014 task. Entirely too many papers are submitted with BLEU scores computed in undocumented ways. It's not even reasonable to allow people to run this script to compare against prior work, because most prior work does not document which script it used. And there are multiple of these running around.

https://www.aclweb.org/anthology/W18-6319/

This script is harmful because it propagates a non-standard way to compute BLEU that is not reflective of the WMT 2014 task.  Entirely too many papers are submitted with BLEU scores computed in undocumented ways.  It's not even reasonable to allow people to run this script to compare against prior work, because most prior work does not document which script it used.  And there are multiple of these running around.  

https://www.aclweb.org/anthology/W18-6319/
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no PR author has not signed CLA label Jul 7, 2020
@martinpopel
Copy link
Contributor

Yes, I fully agree.

It is even worse because using get_ende_bleu.sh is not enough to replicate the "Attention Is All You Need" scores. As I found out with a great and much appreciated help of @lukaszkaiser in April 2018 (and as documented at Gitter https://gitter.im/tensor2tensor/Lobby?at=5acfe16c7c3a01610dd81b46 and several previous posts of mine):

wmt13 wmt14
26.25 27.52 sacrebleu
26.59 28.33 sacrebleu -tok intl
27.10 28.85 sacrebleu -tok intl -lc
????? 29.02 get_ende_bleu.sh (with google-colab reference newstest2014.tok.de)

If anyone is trying to reproduce the Attention Is All You Need paper BLEU scores, note that you must manually tweak the newstest2014.de reference file (in addition to all the hacks in get_ende_bleu.sh): convert all unicode quotes including to ", which will be converted to " by tokenizer.perl (but make sure not to run tokenizer.perl twice to prevent double escaping). There is no such script in mosesdecoder which does this tweak. (replace-unicode-punctuation.perl ignores lower quotes, normalize-punctuation.perl -l de changes the order of comma-quote to quote-comma, which is not what we want). As you can see above, the difference between the official BLEU (sacreBLEU) and Google-tweaked BLEU can be e.g. 1.5 BLEU.

Now, what is an even better practice in MT evaluation than using SacreBLEU (well, I mean in addition to using SacreBLEU)? To publish your translations of your dev and especially test sets (which should be some standard sets for a given language pair, e.g. WMT newstests). However,

  • http://matrix.statmt.org is now deprecated (and probably was not meant for submitting submissions several years after deadline)
  • https://ocelot.mteval.org (by @cfedermann et al.) does not accept submissions for arbitrary WMT newstests yet.
  • https://paperswithcode.com/task/machine-translation seems to only copy the BLEU reported in the paper without recomputing it and storing the translations.
    So my recommendation is to upload the translations anywhere else (e.g. to GitHub repo of your paper).
    This last paragraph is not really related to this PR, but I consider it quite important.

@cfedermann
Copy link

OCELoT will accept arbitrary WMT submissions soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: no PR author has not signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants