[EVAL]: Add more African Benchmarks #373

dadelani · 2024-10-24T00:45:21Z

Evaluation short description

Why is this evaluation interesting?
This focuses on 16 African languages, evaluated on three knowledge QA and reasoning tasks such as AfriMMLU, AfriMGSM and AfriXNLI, human translated from MMLU, MGSM and XNLI respectively.
How used is it in the community?

Evaluation metadata: IrokoBench

Provide all available

Paper url: https://arxiv.org/abs/2406.03368
Github url:
Dataset url: https://huggingface.co/collections/masakhane/irokobench-665a21b6d4714ed3f81af3b1

Evaluation metadata: Uhura

Provide all available

Paper url:
Github url:
Dataset url: https://huggingface.co/datasets/ebayes/uhura-arc-easy-clean , https://huggingface.co/datasets/ebayes/uhura-truthfulqa-clean

Evaluation metadata: SIB-200

Provide all available

Paper url: https://aclanthology.org/2024.eacl-long.14/
Github url: https://github.com/dadelani/sib-200
Dataset url: https://huggingface.co/datasets/Davlan/sib200

@NathanHB

clefourrier · 2024-10-24T08:52:24Z

Hi @dadelani , thanks for your issue!

This sounds like a great idea, do you want to add them yourself to the library and open a PR? We've got a guide here for adding tasks and one here specific to multilingual evaluations :)

hynky1999 · 2024-10-24T10:42:59Z

Hi,
Iroko bench is already implemented

See: the tasks.py
Currently only the swahili subset is available, because we don't have correct translations for anchor words, that are required during the evaluation (e.g. answer/question).

If you are a native speaker in any of the languages we would love your help in improving that!
See the guide how to add a translations:

dadelani · 2024-10-24T14:18:39Z

Great, thank you @clefourrier and @hynky1999 .

For the anchor words, maybe you can use google translate of the keywords. I can provide for Yoruba below:

Question: Ìbéèrè
Answer: Ìdáhùn

NathanHB · 2024-10-24T14:45:14Z

Great ! do you think you can open a PR for the evals and the keywords ? We will not use google translate as it can be unreliable.

As Clementine said, the documentation to add a translation is here:
https://github.com/huggingface/lighteval/wiki/Contributing-to-multilingual-evaluations

dadelani added the new task label Oct 24, 2024

clefourrier added good first issue Good for newcomers help wanted Extra attention is needed labels Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EVAL]: Add more African Benchmarks #373

[EVAL]: Add more African Benchmarks #373

dadelani commented Oct 24, 2024 •

edited

Loading

clefourrier commented Oct 24, 2024

hynky1999 commented Oct 24, 2024 •

edited

Loading

dadelani commented Oct 24, 2024

NathanHB commented Oct 24, 2024 •

edited by clefourrier

Loading

[EVAL]: Add more African Benchmarks #373

[EVAL]: Add more African Benchmarks #373

Comments

dadelani commented Oct 24, 2024 • edited Loading

Evaluation short description

Evaluation metadata: IrokoBench

Evaluation metadata: Uhura

Evaluation metadata: SIB-200

clefourrier commented Oct 24, 2024

hynky1999 commented Oct 24, 2024 • edited Loading

dadelani commented Oct 24, 2024

NathanHB commented Oct 24, 2024 • edited by clefourrier Loading

dadelani commented Oct 24, 2024 •

edited

Loading

hynky1999 commented Oct 24, 2024 •

edited

Loading

NathanHB commented Oct 24, 2024 •

edited by clefourrier

Loading