huggingface / lighteval Public

Notifications You must be signed in to change notification settings
Fork 98
Star 814

Code
Issues 54
Pull requests 12
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Issues: huggingface/lighteval

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

54 Open 93 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[FT] Support llama.cpp inference feature request

New feature/request

#402 opened Nov 22, 2024 by JoelNiklaus

[FT] Add Gemba MQM Translation Metric feature request

New feature/request

#397 opened Nov 19, 2024 by JoelNiklaus

[FT] Is it possible to save the predictions to prevent rerunning expensive inference feature request

New feature/request

#396 opened Nov 19, 2024 by JoelNiklaus

[BUG] Can't use lighteval to evaluate the nanotron bug

Something isn't working

#395 opened Nov 19, 2024 by alexchen4ai

[FT] Evaluation using a multi-document RAG based on statistical tools and LLM as judge feature request

New feature/request

#379 opened Oct 30, 2024 by louisbrulenaudet

[EVAL]: Add more African Benchmarks good first issue

Good for newcomers

help wanted

Extra attention is needed

new task

#373 opened Oct 24, 2024 by dadelani

[FT] Pipeline does not fully handle trust_remote_code to load dataset feature request

New feature/request

#362 opened Oct 15, 2024 by Sanahm

[FT] More general approach than output_regex to model answer extraction feature request

New feature/request

#360 opened Oct 14, 2024 by sadra-barikbin

[FT] Single token completion loglikelihood auto-detection feature request

New feature/request

low prio

#355 opened Oct 10, 2024 by hynky1999

[BUG] batch_size = auto:1 issue bug

Something isn't working

#353 opened Oct 9, 2024 by alozowski

[BUG] assertion error assert text[: len(left)] == left on MATH wen Qwen-Math-2.5 bug

Something isn't working

#345 opened Oct 7, 2024 by d1shs0ap

[EVAL] Add ArenaHardAuto new task prio

#325 opened Sep 23, 2024 by lewtun

[EVAL] Add RewardBench new task

#324 opened Sep 23, 2024 by lewtun

[FT] Enable batched dataset_filter feature request

New feature/request

#322 opened Sep 21, 2024 by chuandudx

[BUG] AttributeError: 'str' object has no attribute 'category' bug

Something isn't working

#320 opened Sep 18, 2024 by Vanessa-Taing

[FT] pass trust_remote_code as flag for loading datasets with custom code feature request

New feature/request

#314 opened Sep 16, 2024 by chuandudx

[FT] Provide an interface for easier edit of parametrizable metrics feature request

New feature/request

#312 opened Sep 16, 2024 by clefourrier

[FT] Remove obsolete config properties (frozen, output_regex) feature request

New feature/request

#305 opened Sep 13, 2024 by hynky1999

[BUG] Question on batch preparation in MMLU evaluation bug

Something isn't working

#288 opened Sep 4, 2024 by JefferyChen453

[BUG] Nanotron batch detection doesn't work bug

Something isn't working

#286 opened Sep 3, 2024 by hynky1999

[BUG] Can not load deutsche-telekom/Ger-RAG-eval dataset. bug

Something isn't working

#278 opened Aug 23, 2024 by PhilipMay

[BUG] Zero accuracy in Hellaswag for Llama-2-7b (using 8bit quantization) bug

Something isn't working

#275 opened Aug 21, 2024 by rankofootball

[FT] IFEval and extended tasks are not in the test suite feature request

New feature/request

#261 opened Aug 14, 2024 by clefourrier

[FT] Detect max length from perplexity feature request

New feature/request

low prio

#257 opened Aug 13, 2024 by clefourrier

[FT] Add tool usage benchmarks feature request

New feature/request

#256 opened Aug 13, 2024 by NathanHB

Previous 1 2 3 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly