Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
KerolosAtef authored Oct 2, 2024
1 parent 9538547 commit e21c3a1
Showing 1 changed file with 4 additions and 19 deletions.
23 changes: 4 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,10 @@

# Overview
![InfiniBench teaser figure](repo_imags/teaser_fig_new.jpg)
Understanding long videos, ranging from tens
of minutes to several hours, presents unique
challenges in video comprehension. Despite
the increasing importance of long-form video
content, existing benchmarks primarily focus
on shorter clips. To address this gap, we introduce InfiniBench a comprehensive benchmark for very long video understanding,
which presents 1) The longest video duration,
averaging 76.34 minutes; 2) The largest number of question-answer pairs, 108.2K; 3) Diversity in questions that examine nine different
skills and include both multiple-choice questions and open-ended questions; 4) Humancentric, as the video sources come from movies
and daily TV shows, with specific human-level
question designs such as Movie Spoiler Questions that require critical thinking and comprehensive understanding. Using InfiniBench, we
comprehensively evaluate existing Large MultiModality Models (LMMs) on each skill, including the commercial model Gemini 1.5 Flash
and the open-source models. The evaluation
shows significant challenges in our benchmark.Our results show that the best AI models such
Gemini struggles to perform well with 42.72%
average accuracy and 2.71 out of 5 average
score. We hope this benchmark will stimulate the LMMs community towards long video
and human-level understanding.
Understanding long videos, ranging from tens of minutes to several hours, presents unique challenges in video comprehension. Despite the increasing importance of long-form video content, existing benchmarks primarily focus on shorter clips. To address this gap, we introduce InfiniBench a comprehensive benchmark for very long video understanding which presents 1)The longest video duration, averaging 52.59 minutes per video. 2) The largest number of question-answer pairs, 108.2K; 3) Diversity in questions that examine nine different skills and include both multiple-choice questions and open-ended questions; 4) Human-centric, as the video sources come from movies and daily TV shows, with specific human-level question designs such as Movie Spoiler Questions that require critical thinking and comprehensive understanding. Using InfiniBench, we comprehensively evaluate existing Large Multi-Modality Models (LMMs) on each skill, including the commercial models such as GPT-4o and Gemini 1.5 Flash and the open-source models.
The evaluation shows significant challenges in our benchmark.
Our findings reveal that even leading AI models like GPT-4o and Gemini 1.5 Flash face challenges in achieving high performance in long video understanding, with average accuracies of just 49.16\% and 42.72\%, and average scores of 3.22 and 2.71 out of 5, respectively.
We hope this benchmark will stimulate the LMMs community towards long video and human-level understanding.
# Leaderboard for top commercial and open souce models:
![results_1](repo_imags/results_1.JPG)
# High level aggregated skills:
Expand All @@ -31,7 +17,6 @@ and human-level understanding.
![benchmark_statistics_1](repo_imags/statistics_1_with_desc.JPG)

![benchmark_statistics_2](repo_imags/statistics_2_with_desc.JPG)

# How to download videos
1- TVQA videos <br>
Download the original TVQA videos for short videos from [here](https://tvqa.cs.unc.edu/download_tvqa.html)<br>
Expand Down

0 comments on commit e21c3a1

Please sign in to comment.