FlagEval
FlagEval, launched by BAAI in 2023, is a comprehensive large model evaluation system that encompasses over 800 open-source and closed-source models from around the globe. It features more than 40 capability dimensions, including reasoning, mathematical skills, and task-solving abilities, along with five major tasks and four categories of metrics.
Recent Developments
In 2024, FlagEval expanded its offerings by launching the Colosseum and Debate Arena. These platforms are dedicated to model-to-model competition and battle, fostering a competitive environment for continuous improvement.