Cetvel: A Unified Benchmark for Evaluating Turkish LLMs

Cetvel is an extended version of the lm-eval-harness tool, specifically includes tasks/datasets for benchmarking Turkish Large Language Models (LLMs). This tool encompasses a variety of tasks curated to assess different aspects of model performance in the Turkish language. Our primary goal is to objectively evaluate the capabilities of large language models in understanding and processing Turkish.

Tasks

Extractive Question Answering
Multiple Choice Question Answering
Natural Language Inference
Text Classification
Machine Translation
Summarization
Grammatical Error Correction

Installation

Clone the repository using the following command to fetch the submodules:

git clone [email protected]:KUIS-AI/cetvel.git --recursive

Create a virtual environment with any tool of your choice (e.g. conda, virtualenv) and install core PyTorch dependencies.

conda create -n cetvel python=3.9
conda activate cetvel
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118

Note that we only tested Cetvel using the specified PyTorch (==2.3.1) and CUDA versions (==11.8).

Install the evaluation harness and other dependencies:

pip install toml
pip install -e ./lm-evaluation-harness
pip install -r requirements.txt

Usage

Cetvel utilizes the identical command line interface as lm-eval-harness. Here is an example command,

python -m lm_eval --model hf --include_path ./tasks/ \
 --model_args pretrained=openai-community/gpt2 \
 --tasks exams_tr,xquad_tr,tquad,turkish_plu \
 --device cuda:0 --batch_size 4 --write_out --log_samples --output_path outs

For more details on the usage, and explore other evaluation settings, refer to the lm-eval-harness repository.

Checkout the examples folder for more examples to run the all tasks with different models.

Task Details

Task	Datasets	Metrics
Extractive Question Answering	xquad tquad MKQA-tr	Exact Match F1
Multiple Choice Question Answering	EXAMS Belebele Turkish PLU XCOPA	Accuracy
Text Classification	IronyTR TRClaim-19 news_cat OffensEval-TR STSb-TR X-FACT	Accuracy
Natural Language Inference	XNLI SNLI-tr MNLI-tr	Accuracy
Machine Translation	wmt2016	WER BLEU
Summarization	TurkishPLU MLSum XLSum WikiLingua	ROUGE
Grammatical Error Correction	gecturk	Exact Match

Citation

If you find Cetvel beneficial for your research, please cite it,

@misc{kuisai2024cetvel,
    title={Cetvel: A Unified Benchmark for Evaluating Turkish LLMs},
    author={Ilker Kesen and Mustafa Cemil Guney and Aykut Erdem and Gozde Gul Sahin},
    year={2024},
    url={https://github.com/KUIS-AI/cetvel}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
examples		examples
lm-evaluation-harness @ 6e49b1f		lm-evaluation-harness @ 6e49b1f
tasks		tasks
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cetvel: A Unified Benchmark for Evaluating Turkish LLMs

Tasks

Installation

Usage

Task Details

Citation

About

Releases

Packages

Contributors 2

Languages

KUIS-AI/cetvel

Folders and files

Latest commit

History

Repository files navigation

Cetvel: A Unified Benchmark for Evaluating Turkish LLMs

Tasks

Installation

Usage

Task Details

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages