Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EVAL] Add ArenaHardAuto #325

Open
lewtun opened this issue Sep 23, 2024 · 0 comments
Open

[EVAL] Add ArenaHardAuto #325

lewtun opened this issue Sep 23, 2024 · 0 comments

Comments

@lewtun
Copy link
Member

lewtun commented Sep 23, 2024

Evaluation short description

  • Why is this evaluation interesting?

Many benchmarks are getting saturated by new models. LMSYS has crowd-sourced a variety of hard prompts from the community and this provides a strong correlation with Elo scores.

  • How used is it in the community?

Recent papers are starting to report ArenaHard as a core metric to measure the improvements from new post-training methods. It is also becoming a new alternative to MT-Bench due to it's difficulty and real-world source of prompts.

Evaluation metadata

Provide all available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants