Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate the model with OpenAI Evals #12

Open
walking-octopus opened this issue May 21, 2023 · 0 comments
Open

Evaluate the model with OpenAI Evals #12

walking-octopus opened this issue May 21, 2023 · 0 comments

Comments

@walking-octopus
Copy link

After releasing GPT-4, OpenAI was met with a significant challenge: there weren't many benchmarks for LLMs focused on emergent capabilities like translation, reasoning, pattern identification, reasoning, etc. So they've created Evals, a coudsourced open-source set of benchmarks for LLMs. While somewhat OpenAI-centric, as the submission rules prohibit adding tests that GPT-4 can already consistently pass, it still remains a valuable tool for objective model evaluation.

If different open-access LLM projects can switch to a well-designed common benchmark, we may finally get to objectively compare our model quality, which I find essential for the future if local LLMs. For example, we may compare it against WizardLM, raw Vicuna, or GPT-3.5.

For reference on testing non OpenAI models with Evals, see OpenAssistant model evals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant