Skip to content

FlagEval Logo


FlagEval

FlagEval, launched by BAAI in 2023, is a comprehensive large model evaluation system that encompasses over 800 open-source and closed-source models from around the globe. It features more than 40 capability dimensions, including reasoning, mathematical skills, and task-solving abilities, along with five major tasks and four categories of metrics.


Recent Developments

In 2024, FlagEval expanded its offerings by launching the Colosseum and Debate Arena. These platforms are dedicated to model-to-model competition and battle, fostering a competitive environment for continuous improvement.


Visit FlagEval

Popular repositories Loading

  1. FlagEval FlagEval Public

    FlagEval is an evaluation toolkit for AI large foundation models.

    Python 301 28

  2. CMMU CMMU Public

    [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

    Python 22

  3. HalluDial HalluDial Public

    Python 14 1

  4. FlagEval_Report FlagEval_Report Public

    CSS

  5. .github .github Public

Repositories

Showing 5 of 5 repositories
  • .github Public
    flageval-baai/.github’s past year of commit activity
    0 0 0 0 Updated Nov 8, 2024
  • flageval-baai/HalluDial’s past year of commit activity
    Python 14 1 1 0 Updated Aug 19, 2024
  • flageval-baai/FlagEval_Report’s past year of commit activity
    CSS 0 0 0 0 Updated Jul 18, 2024
  • FlagEval Public

    FlagEval is an evaluation toolkit for AI large foundation models.

    flageval-baai/FlagEval’s past year of commit activity
    Python 301 Apache-2.0 28 4 2 Updated Jul 13, 2024
  • CMMU Public

    [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

    flageval-baai/CMMU’s past year of commit activity
    Python 22 0 0 0 Updated Feb 1, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…