FlagEval

FlagEval

FlagEval, launched by BAAI in 2023, is a comprehensive large model evaluation system that encompasses over 800 open-source and closed-source models from around the globe. It features more than 40 capability dimensions, including reasoning, mathematical skills, and task-solving abilities, along with five major tasks and four categories of metrics.

Recent Developments

In 2024, FlagEval expanded its offerings by launching the Colosseum and Debate Arena. These platforms are dedicated to model-to-model competition and battle, fostering a competitive environment for continuous improvement.

Visit FlagEval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlagEval

Popular repositories Loading

Repositories

People

Top languages

Most used topics