EleutherAI · zxcvuser · Jul 19, 2024 · Jul 22, 2024 · Jul 29, 2024 · Jul 30, 2024
@@ -0,0 +1,80 @@
+# GalicianBench
+
+### Paper
+
+GalicianBench is a benchmark for evaluating language models in Galician tasks. This is, it evaluates the ability of a language model to understand and generate Galician text. GalicianBench offers a combination of pre-existing, open datasets and datasets developed exclusivelly for this benchmark. All the details of GalicianBench will be published in a paper soon.
+
+The new evaluation datasets included in GalicianBench are:
+| Task | Category | Homepage |
+|:-------------:|:-----:|:-----:|
+| Belebele_gl | Reading Comprehension | https://huggingface.co/datasets/proxectonos/belebele_gl |
+| GalCoLA | Linguistic Acceptability | https://huggingface.co/datasets/proxectonos/galcola |
+| MGSM_ca | Math | https://huggingface.co/datasets/proxectonos/mgsm_gl |
+| Parafrases_gl | Paraphrasing | https://huggingface.co/datasets/proxectonos/parafrases_gl |
+| PAWS-gl | Paraphrasing | https://huggingface.co/datasets/proxectonos/PAWS-gl |
+| OpenBookQA_gl | Question Answering | https://huggingface.co/datasets/proxectonos/openbookqa_gl |
+| Summarization_gl | Summarization | https://huggingface.co/datasets/proxectonos/summarization_gl |
+| TruthfulQA_gl | Truthfulness | https://huggingface.co/datasets/proxectonos/truthfulqa_gl |
+| xnli_gl | NLI | https://huggingface.co/datasets/proxectonos/xnli_gl |
+| xstorycloze_gl | Commonsense Reasoning | https://huggingface.co/datasets/proxectonos/xstorycloze_gl |
+
+The datasets included in GalicianBench that have been made public in previous pubications are:
+
+| Task | Category | Paper title | Homepage |
+|:-------------:|:-----:|:-------------:|:-----:|
+| FLORES_gl | Translation | [The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation](https://arxiv.org/abs/2106.03193) | https://huggingface.co/datasets/facebook/flores |
+
+
+### Citation
+Paper for GalicianBench coming soon.
+
+### Groups and Tasks
+
+#### Groups
+
+- `galician_bench`: All tasks included in GalicianBench.
+- `flores_gl`: All FLORES translation tasks from or to Galician.
+
+
+#### Tasks
+
+The following tasks evaluate tasks on GalicianBench dataset using various scoring methods.
+ - `belebele_glg_Latn`
+ - `flores_gl`
+ - `flores_gl-ca`
+ - `flores_gl-de`
+ - `flores_gl-en`
+ - `flores_gl-es`
+ - `flores_gl-eu`
+ - `flores_gl-fr`
+ - `flores_gl-it`
+ - `flores_gl-pt`
+ - `flores_ca-gl`
+ - `flores_de-gl`
+ - `flores_en-gl`
+ - `flores_es-gl`
+ - `flores_eu-gl`
+ - `flores_fr-gl`
+ - `flores_it-gl`
+ - `flores_pt-gl`
+ - `galcola`
+ - `summarization_gl`
+ - `parafrases_gl`
+ - `paws_gl`
+ - `openbookqa_gl`
+ - `mgsm_direct_gl`
+ - `truthfulqa_gl`
+ - `xnli_gl`
+ - `xstorycloze_gl`
+
+### Checklist
+
+* [x] Is the task an existing benchmark in the literature?
+ * [ ] Have you referenced the original paper that introduced the task?
+ * [ ] If yes, does the original paper provide a reference implementation?
+ * [ ] Yes, original implementation contributed by author of the benchmark
+
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
@@ -0,0 +1,9 @@
+group:
+ - belebele
+task: belebele_glg_Latn
+include: ../belebele/_default_template_yaml
+dataset_path: proxectonos/belebele_gl
+fewshot_split: train
+test_split: train
+metadata:
+ version: 1.0
@@ -0,0 +1,28 @@
+group: flores
-group: flores
+tag: flores
-group: flores
+tag: flores
+dataset_path: facebook/flores
+dataset_name: all
+output_type: generate_until
+#! The test split of flores is not publicly available! (See paper section 6.1)
+#! We are using `dev` and `devtest` splits, but they're mapped to train/validation/test in `data/flores/flores.py`.
+training_split: dev
+validation_split: dev
+test_split: devtest
+fewshot_split: dev
+target_delimiter: ''
+generation_kwargs:
+ until:
+ - "\n"
+metric_list:
+ - metric: bleu
+ aggregation: bleu
+ higher_is_better: true
+ - metric: ter
+ aggregation: ter
+ higher_is_better: false
+ - metric: chrf
+ aggregation: chrf
+ higher_is_better: true
+metadata:
+ version: 1.0
+dataset_kwargs:
+ trust_remote_code: true
diff --git a/lm_eval/tasks/galician_bench/flores_gl/create-yamls_flores_gl.py b/lm_eval/tasks/galician_bench/flores_gl/create-yamls_flores_gl.py
@@ -0,0 +1,115 @@
+"""
+Script to generate task YAMLs for the FLORES-200 dataset.
+Based on `tasks/translation/utils.py`.
+"""
+
+import argparse
+import yaml
+from langcodes import *
+from itertools import *
+
+# utils
+flatten = lambda l: list(itertools.chain(*l))
+
+# constants
+_LANGUAGES = [
+"ace_Arab", "bam_Latn", "dzo_Tibt", "hin_Deva", "khm_Khmr", "mag_Deva", "pap_Latn", "sot_Latn", "tur_Latn",
+"ace_Latn", "ban_Latn", "ell_Grek", "hne_Deva", "kik_Latn", "mai_Deva", "pbt_Arab", "spa_Latn", "twi_Latn",
+"acm_Arab", "bel_Cyrl", "eng_Latn", "hrv_Latn", "kin_Latn", "mal_Mlym", "pes_Arab", "srd_Latn", "tzm_Tfng",
+"acq_Arab", "bem_Latn", "epo_Latn", "hun_Latn", "kir_Cyrl", "mar_Deva", "plt_Latn", "srp_Cyrl", "uig_Arab",
+"aeb_Arab", "ben_Beng", "est_Latn", "hye_Armn", "kmb_Latn", "min_Arab", "pol_Latn", "ssw_Latn", "ukr_Cyrl",
+"afr_Latn", "bho_Deva", "eus_Latn", "ibo_Latn", "kmr_Latn", "min_Latn", "por_Latn", "sun_Latn", "umb_Latn",
+"ajp_Arab", "bjn_Arab", "ewe_Latn", "ilo_Latn", "knc_Arab", "mkd_Cyrl", "prs_Arab", "swe_Latn", "urd_Arab",
+"aka_Latn", "bjn_Latn", "fao_Latn", "ind_Latn", "knc_Latn", "mlt_Latn", "quy_Latn", "swh_Latn", "uzn_Latn",
+"als_Latn", "bod_Tibt", "fij_Latn", "isl_Latn", "kon_Latn", "mni_Beng", "ron_Latn", "szl_Latn", "vec_Latn",
+"amh_Ethi", "bos_Latn", "fin_Latn", "ita_Latn", "kor_Hang", "mos_Latn", "run_Latn", "tam_Taml", "vie_Latn",
+"apc_Arab", "bug_Latn", "fon_Latn", "jav_Latn", "lao_Laoo", "mri_Latn", "rus_Cyrl", "taq_Latn", "war_Latn",
+"arb_Arab", "bul_Cyrl", "fra_Latn", "jpn_Jpan", "lij_Latn", "mya_Mymr", "sag_Latn", "taq_Tfng", "wol_Latn",
+"arb_Latn", "cat_Latn", "fur_Latn", "kab_Latn", "lim_Latn", "nld_Latn", "san_Deva", "tat_Cyrl", "xho_Latn",
+"ars_Arab", "ceb_Latn", "fuv_Latn", "kac_Latn", "lin_Latn", "nno_Latn", "sat_Olck", "tel_Telu", "ydd_Hebr",
+"ary_Arab", "ces_Latn", "gaz_Latn", "kam_Latn", "lit_Latn", "nob_Latn", "scn_Latn", "tgk_Cyrl", "yor_Latn",
+"arz_Arab", "cjk_Latn", "gla_Latn", "kan_Knda", "lmo_Latn", "npi_Deva", "shn_Mymr", "tgl_Latn", "yue_Hant",
+"asm_Beng", "ckb_Arab", "gle_Latn", "kas_Arab", "ltg_Latn", "nso_Latn", "sin_Sinh", "tha_Thai", "zho_Hans",
+"ast_Latn", "crh_Latn", "glg_Latn", "kas_Deva", "ltz_Latn", "nus_Latn", "slk_Latn", "tir_Ethi", "zho_Hant",
+"awa_Deva", "cym_Latn", "grn_Latn", "kat_Geor", "lua_Latn", "nya_Latn", "slv_Latn", "tpi_Latn", "zsm_Latn",
+"ayr_Latn", "dan_Latn", "guj_Gujr", "kaz_Cyrl", "lug_Latn", "oci_Latn", "smo_Latn", "tsn_Latn", "zul_Latn",
+"azb_Arab", "deu_Latn", "hat_Latn", "kbp_Latn", "luo_Latn", "ory_Orya", "sna_Latn", "tso_Latn",
+"azj_Latn", "dik_Latn", "hau_Latn", "kea_Latn", "lus_Latn", "pag_Latn", "snd_Arab", "tuk_Latn",
+"bak_Cyrl", "dyu_Latn", "heb_Hebr", "khk_Cyrl", "lvs_Latn", "pan_Guru", "som_Latn", "tum_Latn"
+]
+LANGUAGE_PAIRS = [(a, b) for idx, a in enumerate(_LANGUAGES) for b in _LANGUAGES[idx + 1:]]
+
+LANGUAGES_OF_INTEREST = ["cat_Latn", "spa_Latn", "eng_Latn", "glg_Latn", "eus_Latn", "ita_Latn", "deu_Latn", "por_Latn", "fra_Latn"]
+MAIN_LANG = "glg_Latn"
+LANGUAGE_PAIRS = [(a, b) for (a, b) in LANGUAGE_PAIRS if a in LANGUAGES_OF_INTEREST and b in LANGUAGES_OF_INTEREST and MAIN_LANG in (a, b)]
+
+# auxiliary functions
+
+code_to_language_name = lambda code: Language.make(language=Language.get(code)["language"]).display_name()
+code_to_short_name = lambda code: Language.get(code)["language"]
+jinja_var = lambda s: "{{" + s + "}}" # wrapper to avoid having to escape { } in format strings
+
+def doc_to_text(src: str, tgt: str) -> str:
+ src_name, tgt_name = map(code_to_language_name, [src, tgt])
+
+ return f"""\
+{src_name} sentence: {jinja_var('sentence_' + src)}
+{tgt_name} sentence:"""
+
+def doc_to_target(tgt: str) -> str:
+
+ return f"{jinja_var('sentence_' + tgt)}"
+
+# main function
+
+def gen_lang_yamls(output_dir: str, overwrite: bool) -> None:
+ """
+ Generate a YAML file for each translation direction.
+ """
+
+ err = []
+ for src, tgt in LANGUAGE_PAIRS:
+
+ # do both translation directions for each lang pair
+ for src, tgt in [(src, tgt), (tgt, src)]:
+ lang_pair_name = f"{code_to_short_name(src)}-{code_to_short_name(tgt)}"
+ yaml_file_name = f"flores_{lang_pair_name}.yaml"
+
+ try:
+ with open( f"{output_dir}/{yaml_file_name}", "w" if overwrite else "x", encoding="utf-8") as outfile:
+ print(f"Creating {yaml_file_name}...")
+ outfile.write("# File generated by `create-yamls.py`\n")
+ yaml.dump(
+ {
+# "group": [f"{BENCH_NAME}_bench", f"{BENCH_NAME}_bench_flores"],
+# "group": "flores_gl",
+ "include": "_flores_common_yaml",
+ "task": f"flores_{lang_pair_name}",
+ "doc_to_text": doc_to_text(src, tgt),
+ "doc_to_target": doc_to_target(tgt),
+ },
+ outfile,
+ sort_keys=False,
+ )
+
+ except FileExistsError:
+ err.append(yaml_file_name)
+
+ if len(err) > 0:
+ raise FileExistsError(
+ "Files were not created because they already exist:"
+ f" {', '.join(err)}"
+ "\nUse flag --overwrite to overwrite them."
+ )
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--overwrite", default=False, action="store_true", help="Overwrite files if they already exist")
+ parser.add_argument( "--output-dir", default=".", help="Directory to write yaml files to" )
+ args = parser.parse_args()
+
+ gen_lang_yamls(output_dir=args.output_dir, overwrite=args.overwrite)
+
+if __name__ == "__main__":
+ main()
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_ca-gl
+doc_to_text: 'Catalan sentence: {{sentence_cat_Latn}}
+
+ Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_de-gl
+doc_to_text: 'German sentence: {{sentence_deu_Latn}}
+
+ Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_en-gl
+doc_to_text: 'English sentence: {{sentence_eng_Latn}}
+
+ Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-gl
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+ Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_eu-gl
+doc_to_text: 'Basque sentence: {{sentence_eus_Latn}}
+
+ Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_fr-gl
+doc_to_text: 'French sentence: {{sentence_fra_Latn}}
+
+ Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-ca
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+ Catalan sentence:'
+doc_to_target: '{{sentence_cat_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-de
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+ German sentence:'
+doc_to_target: '{{sentence_deu_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-en
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+ English sentence:'
+doc_to_target: '{{sentence_eng_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-es
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+ Spanish sentence:'
+doc_to_target: '{{sentence_spa_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-eu
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+ Basque sentence:'
+doc_to_target: '{{sentence_eus_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-fr
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+ French sentence:'
+doc_to_target: '{{sentence_fra_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-it
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+ Italian sentence:'
+doc_to_target: '{{sentence_ita_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-pt
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+ Portuguese sentence:'
+doc_to_target: '{{sentence_por_Latn}}'
@@ -0,0 +1,23 @@
+group: flores_gl
+task:
+ - flores_es-gl
+ - flores_gl-es
+ - flores_en-gl
+ - flores_gl-en
+ - flores_eu-gl
+ - flores_gl-eu
+ - flores_pt-gl
+ - flores_gl-pt
+ - flores_it-gl
+ - flores_gl-it
+ - flores_fr-gl
+ - flores_gl-fr
+ - flores_ca-gl
+ - flores_gl-ca
+ - flores_gl-de
+ - flores_de-gl
+aggregate_metric_list:
+ - metric: bleu
+ aggregation: mean
+metadata:
+ version: 1.0
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_it-gl
+doc_to_text: 'Italian sentence: {{sentence_ita_Latn}}
+
+ Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'
@@ -0,0 +1,7 @@
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_pt-gl
+doc_to_text: 'Portuguese sentence: {{sentence_por_Latn}}
+
+ Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'