We present MMIE, a Massive Multimodal Interleaved understanding Evaluation benchmark, designed for Large Vision-Language Models (LVLMs). MMIE provides a robust framework to assess the interleaved comprehension and generation capabilities of LVLMs across diverse domains, supported by reliable automated metrics.
We have host MMIE dataset on HuggingFace, where you should request access on this page first and shall be automatically approved.
Please download all the files in this repository and unzip images.tar.gz
to get all images. We also provide overview.json
, which is an example of the format of our dataset.
Your to-eval data format should be:
[
{
"id": "",
"question": [
{
"text": "...",
"image": LOCAL_PATH_TO_THE_IMAGE or null
},
...
],
"answer": [
{
"text": "...",
"image": LOCAL_PATH_TO_THE_IMAGE or null
},
...
],
"model": "gt",
"gt_answer": [
{
"text": "...",
"image": LOCAL_PATH_TO_THE_IMAGE or null
},
...
]
},
...
]
Currently gt_answer
is only used for Multi-step Reasoning tasks. But it is required in the data format. You can set "gt_answer": [{"text": None,"image":None}]
for other tasks.
Make sure the file structure be:
INPUT_DIR
|INPUT_FILE(data.json)
|images
|0.png
|1.png
|...
- Clone code from this repo
git clone https://github.com/Lillianwei-h/MMIE
cd MMIE
- Build environment
conda create -n MMIE python=3.11
conda activate MMIE
pip install -r requirements.txt
pip install flash_attn
You can request access to our MMIE-Score model on HuggingFace and refer to the document of InternVL 2.0 to find more details.
python main.py --input_dir INPUT_DIR --input_file INPUT_FILE
The output file should be at ./eval_outputs/eval_result.json
by default. You can also use arguments --output_dir
and --output_file
to specify your intended output position.
If you find our benchmark useful in your research, please kindly consider citing us:
@article{xia2024mmie,
title={MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models},
author={Xia, Peng and Han, Siwei and Qiu, Shi and Zhou, Yiyang and Wang, Zhaoyang and Zheng, Wenhao and Chen, Zhaorun and Cui, Chenhang and Ding, Mingyu and Li, Linjie and Wang, Lijuan and Yao, Huaxiu},
journal={arXiv preprint arXiv:2410.10139},
year={2024}
}