Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new benchmark: Galician bench #2155

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions lm_eval/tasks/galician_bench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# GalicianBench

### Paper

GalicianBench is a benchmark for evaluating language models in Galician tasks. This is, it evaluates the ability of a language model to understand and generate Galician text. GalicianBench offers a combination of pre-existing, open datasets and datasets developed exclusivelly for this benchmark. All the details of GalicianBench will be published in a paper soon.

The new evaluation datasets included in GalicianBench are:
| Task | Category | Homepage |
|:-------------:|:-----:|:-----:|
| Belebele_gl | Reading Comprehension | https://huggingface.co/datasets/proxectonos/belebele_gl |
| GalCoLA | Linguistic Acceptability | https://huggingface.co/datasets/proxectonos/galcola |
| MGSM_ca | Math | https://huggingface.co/datasets/proxectonos/mgsm_gl |
| Parafrases_gl | Paraphrasing | https://huggingface.co/datasets/proxectonos/parafrases_gl |
| PAWS-gl | Paraphrasing | https://huggingface.co/datasets/proxectonos/PAWS-gl |
| OpenBookQA_gl | Question Answering | https://huggingface.co/datasets/proxectonos/openbookqa_gl |
| Summarization_gl | Summarization | https://huggingface.co/datasets/proxectonos/summarization_gl |
| TruthfulQA_gl | Truthfulness | https://huggingface.co/datasets/proxectonos/truthfulqa_gl |
| xnli_gl | NLI | https://huggingface.co/datasets/proxectonos/xnli_gl |
| xstorycloze_gl | Commonsense Reasoning | https://huggingface.co/datasets/proxectonos/xstorycloze_gl |

The datasets included in GalicianBench that have been made public in previous pubications are:

| Task | Category | Paper title | Homepage |
|:-------------:|:-----:|:-------------:|:-----:|
| FLORES_gl | Translation | [The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation](https://arxiv.org/abs/2106.03193) | https://huggingface.co/datasets/facebook/flores |


### Citation
Paper for GalicianBench coming soon.

### Groups and Tasks

#### Groups

- `galician_bench`: All tasks included in GalicianBench.
- `flores_gl`: All FLORES translation tasks from or to Galician.


#### Tasks

The following tasks evaluate tasks on GalicianBench dataset using various scoring methods.
- `belebele_glg_Latn`
- `flores_gl`
- `flores_gl-ca`
- `flores_gl-de`
- `flores_gl-en`
- `flores_gl-es`
- `flores_gl-eu`
- `flores_gl-fr`
- `flores_gl-it`
- `flores_gl-pt`
- `flores_ca-gl`
- `flores_de-gl`
- `flores_en-gl`
- `flores_es-gl`
- `flores_eu-gl`
- `flores_fr-gl`
- `flores_it-gl`
- `flores_pt-gl`
- `galcola`
- `summarization_gl`
- `parafrases_gl`
- `paws_gl`
- `openbookqa_gl`
- `mgsm_direct_gl`
- `truthfulqa_gl`
- `xnli_gl`
- `xstorycloze_gl`

### Checklist

* [x] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation?
* [ ] Yes, original implementation contributed by author of the benchmark

If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
9 changes: 9 additions & 0 deletions lm_eval/tasks/galician_bench/belebele_glg_Latn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
group:
- belebele
task: belebele_glg_Latn
include: ../belebele/_default_template_yaml
dataset_path: proxectonos/belebele_gl
fewshot_split: train
test_split: train
metadata:
version: 1.0
28 changes: 28 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/_flores_common_yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
group: flores
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
group: flores
tag: flores

dataset_path: facebook/flores
dataset_name: all
output_type: generate_until
#! The test split of flores is not publicly available! (See paper section 6.1)
#! We are using `dev` and `devtest` splits, but they're mapped to train/validation/test in `data/flores/flores.py`.
training_split: dev
validation_split: dev
test_split: devtest
fewshot_split: dev
target_delimiter: ''
generation_kwargs:
until:
- "\n"
metric_list:
- metric: bleu
aggregation: bleu
higher_is_better: true
- metric: ter
aggregation: ter
higher_is_better: false
- metric: chrf
aggregation: chrf
higher_is_better: true
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
115 changes: 115 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/create-yamls_flores_gl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
"""
Script to generate task YAMLs for the FLORES-200 dataset.
Based on `tasks/translation/utils.py`.
"""

import argparse
import yaml
from langcodes import *
from itertools import *

# utils
flatten = lambda l: list(itertools.chain(*l))

# constants
_LANGUAGES = [
"ace_Arab", "bam_Latn", "dzo_Tibt", "hin_Deva", "khm_Khmr", "mag_Deva", "pap_Latn", "sot_Latn", "tur_Latn",
"ace_Latn", "ban_Latn", "ell_Grek", "hne_Deva", "kik_Latn", "mai_Deva", "pbt_Arab", "spa_Latn", "twi_Latn",
"acm_Arab", "bel_Cyrl", "eng_Latn", "hrv_Latn", "kin_Latn", "mal_Mlym", "pes_Arab", "srd_Latn", "tzm_Tfng",
"acq_Arab", "bem_Latn", "epo_Latn", "hun_Latn", "kir_Cyrl", "mar_Deva", "plt_Latn", "srp_Cyrl", "uig_Arab",
"aeb_Arab", "ben_Beng", "est_Latn", "hye_Armn", "kmb_Latn", "min_Arab", "pol_Latn", "ssw_Latn", "ukr_Cyrl",
"afr_Latn", "bho_Deva", "eus_Latn", "ibo_Latn", "kmr_Latn", "min_Latn", "por_Latn", "sun_Latn", "umb_Latn",
"ajp_Arab", "bjn_Arab", "ewe_Latn", "ilo_Latn", "knc_Arab", "mkd_Cyrl", "prs_Arab", "swe_Latn", "urd_Arab",
"aka_Latn", "bjn_Latn", "fao_Latn", "ind_Latn", "knc_Latn", "mlt_Latn", "quy_Latn", "swh_Latn", "uzn_Latn",
"als_Latn", "bod_Tibt", "fij_Latn", "isl_Latn", "kon_Latn", "mni_Beng", "ron_Latn", "szl_Latn", "vec_Latn",
"amh_Ethi", "bos_Latn", "fin_Latn", "ita_Latn", "kor_Hang", "mos_Latn", "run_Latn", "tam_Taml", "vie_Latn",
"apc_Arab", "bug_Latn", "fon_Latn", "jav_Latn", "lao_Laoo", "mri_Latn", "rus_Cyrl", "taq_Latn", "war_Latn",
"arb_Arab", "bul_Cyrl", "fra_Latn", "jpn_Jpan", "lij_Latn", "mya_Mymr", "sag_Latn", "taq_Tfng", "wol_Latn",
"arb_Latn", "cat_Latn", "fur_Latn", "kab_Latn", "lim_Latn", "nld_Latn", "san_Deva", "tat_Cyrl", "xho_Latn",
"ars_Arab", "ceb_Latn", "fuv_Latn", "kac_Latn", "lin_Latn", "nno_Latn", "sat_Olck", "tel_Telu", "ydd_Hebr",
"ary_Arab", "ces_Latn", "gaz_Latn", "kam_Latn", "lit_Latn", "nob_Latn", "scn_Latn", "tgk_Cyrl", "yor_Latn",
"arz_Arab", "cjk_Latn", "gla_Latn", "kan_Knda", "lmo_Latn", "npi_Deva", "shn_Mymr", "tgl_Latn", "yue_Hant",
"asm_Beng", "ckb_Arab", "gle_Latn", "kas_Arab", "ltg_Latn", "nso_Latn", "sin_Sinh", "tha_Thai", "zho_Hans",
"ast_Latn", "crh_Latn", "glg_Latn", "kas_Deva", "ltz_Latn", "nus_Latn", "slk_Latn", "tir_Ethi", "zho_Hant",
"awa_Deva", "cym_Latn", "grn_Latn", "kat_Geor", "lua_Latn", "nya_Latn", "slv_Latn", "tpi_Latn", "zsm_Latn",
"ayr_Latn", "dan_Latn", "guj_Gujr", "kaz_Cyrl", "lug_Latn", "oci_Latn", "smo_Latn", "tsn_Latn", "zul_Latn",
"azb_Arab", "deu_Latn", "hat_Latn", "kbp_Latn", "luo_Latn", "ory_Orya", "sna_Latn", "tso_Latn",
"azj_Latn", "dik_Latn", "hau_Latn", "kea_Latn", "lus_Latn", "pag_Latn", "snd_Arab", "tuk_Latn",
"bak_Cyrl", "dyu_Latn", "heb_Hebr", "khk_Cyrl", "lvs_Latn", "pan_Guru", "som_Latn", "tum_Latn"
]
LANGUAGE_PAIRS = [(a, b) for idx, a in enumerate(_LANGUAGES) for b in _LANGUAGES[idx + 1:]]

LANGUAGES_OF_INTEREST = ["cat_Latn", "spa_Latn", "eng_Latn", "glg_Latn", "eus_Latn", "ita_Latn", "deu_Latn", "por_Latn", "fra_Latn"]
MAIN_LANG = "glg_Latn"
LANGUAGE_PAIRS = [(a, b) for (a, b) in LANGUAGE_PAIRS if a in LANGUAGES_OF_INTEREST and b in LANGUAGES_OF_INTEREST and MAIN_LANG in (a, b)]

# auxiliary functions

code_to_language_name = lambda code: Language.make(language=Language.get(code)["language"]).display_name()
code_to_short_name = lambda code: Language.get(code)["language"]
jinja_var = lambda s: "{{" + s + "}}" # wrapper to avoid having to escape { } in format strings

def doc_to_text(src: str, tgt: str) -> str:
src_name, tgt_name = map(code_to_language_name, [src, tgt])

return f"""\
{src_name} sentence: {jinja_var('sentence_' + src)}
{tgt_name} sentence:"""

def doc_to_target(tgt: str) -> str:

return f"{jinja_var('sentence_' + tgt)}"

# main function

def gen_lang_yamls(output_dir: str, overwrite: bool) -> None:
"""
Generate a YAML file for each translation direction.
"""

err = []
for src, tgt in LANGUAGE_PAIRS:

# do both translation directions for each lang pair
for src, tgt in [(src, tgt), (tgt, src)]:
lang_pair_name = f"{code_to_short_name(src)}-{code_to_short_name(tgt)}"
yaml_file_name = f"flores_{lang_pair_name}.yaml"

try:
with open( f"{output_dir}/{yaml_file_name}", "w" if overwrite else "x", encoding="utf-8") as outfile:
print(f"Creating {yaml_file_name}...")
outfile.write("# File generated by `create-yamls.py`\n")
yaml.dump(
{
# "group": [f"{BENCH_NAME}_bench", f"{BENCH_NAME}_bench_flores"],
# "group": "flores_gl",
"include": "_flores_common_yaml",
"task": f"flores_{lang_pair_name}",
"doc_to_text": doc_to_text(src, tgt),
"doc_to_target": doc_to_target(tgt),
},
outfile,
sort_keys=False,
)

except FileExistsError:
err.append(yaml_file_name)

if len(err) > 0:
raise FileExistsError(
"Files were not created because they already exist:"
f" {', '.join(err)}"
"\nUse flag --overwrite to overwrite them."
)


def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--overwrite", default=False, action="store_true", help="Overwrite files if they already exist")
parser.add_argument( "--output-dir", default=".", help="Directory to write yaml files to" )
args = parser.parse_args()

gen_lang_yamls(output_dir=args.output_dir, overwrite=args.overwrite)

if __name__ == "__main__":
main()
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_ca-gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_ca-gl
doc_to_text: 'Catalan sentence: {{sentence_cat_Latn}}
Galician sentence:'
doc_to_target: '{{sentence_glg_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_de-gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_de-gl
doc_to_text: 'German sentence: {{sentence_deu_Latn}}
Galician sentence:'
doc_to_target: '{{sentence_glg_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_en-gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_en-gl
doc_to_text: 'English sentence: {{sentence_eng_Latn}}
Galician sentence:'
doc_to_target: '{{sentence_glg_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_es-gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_es-gl
doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
Galician sentence:'
doc_to_target: '{{sentence_glg_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_eu-gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_eu-gl
doc_to_text: 'Basque sentence: {{sentence_eus_Latn}}
Galician sentence:'
doc_to_target: '{{sentence_glg_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_fr-gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_fr-gl
doc_to_text: 'French sentence: {{sentence_fra_Latn}}
Galician sentence:'
doc_to_target: '{{sentence_glg_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl-ca.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_gl-ca
doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
Catalan sentence:'
doc_to_target: '{{sentence_cat_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl-de.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_gl-de
doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
German sentence:'
doc_to_target: '{{sentence_deu_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl-en.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_gl-en
doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
English sentence:'
doc_to_target: '{{sentence_eng_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl-es.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_gl-es
doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
Spanish sentence:'
doc_to_target: '{{sentence_spa_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl-eu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_gl-eu
doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}

Basque sentence:'
doc_to_target: '{{sentence_eus_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl-fr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_gl-fr
doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
French sentence:'
doc_to_target: '{{sentence_fra_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl-it.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_gl-it
doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
Italian sentence:'
doc_to_target: '{{sentence_ita_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl-pt.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_gl-pt
doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
Portuguese sentence:'
doc_to_target: '{{sentence_por_Latn}}'
23 changes: 23 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
group: flores_gl
task:
- flores_es-gl
- flores_gl-es
- flores_en-gl
- flores_gl-en
- flores_eu-gl
- flores_gl-eu
- flores_pt-gl
- flores_gl-pt
- flores_it-gl
- flores_gl-it
- flores_fr-gl
- flores_gl-fr
- flores_ca-gl
- flores_gl-ca
- flores_gl-de
- flores_de-gl
aggregate_metric_list:
- metric: bleu
aggregation: mean
metadata:
version: 1.0
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_it-gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_it-gl
doc_to_text: 'Italian sentence: {{sentence_ita_Latn}}
Galician sentence:'
doc_to_target: '{{sentence_glg_Latn}}'
7 changes: 7 additions & 0 deletions lm_eval/tasks/galician_bench/flores_gl/flores_pt-gl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# File generated by `create-yamls.py`
include: _flores_common_yaml
task: flores_pt-gl
doc_to_text: 'Portuguese sentence: {{sentence_por_Latn}}
Galician sentence:'
doc_to_target: '{{sentence_glg_Latn}}'
Loading