Although some weights can be downloaded dynamically at runtime, it is recommended to pre-download them for speeding up each run.
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth
the path of image encoder weight can be modified here.
# InstructBLIP (recommended)
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/InstructBLIP/instruct_blip_vicuna7b_trimmed.pth
# MiniGPT4
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xxl.pth
wget https://huggingface.co/Vision-CAIR/MiniGPT-4/blob/main/pretrained_minigpt4.pth
the path of Q-Former and Linear Weight can be modified in q_former_model
and ckpt
in each config here.
Please first follow the instructions to prepare Vicuna v1.1 (for InstructBLIP) or Vicuna v1.0 (for MiniGPT4).
Then modify the llama_model
in each config here to the folder that contains Vicuna weights.
We follow VideoChat2 to maintain consistency in the format of each instruction dataset. Please follow the source instructions to prepare the videos and annotations for each dataset. Then modify the path for each dataset here.
Please note:
(1)We do not need to prepare all datasets; we only need to prepare the datasets corresponding to the configurations needed for execution.
(2) The annotations for videochat11k and videochatgpt100k are slightly different from the source, which can be found here.
Please first modify the path in train script for the desired config from config folder, then run
bash script/train/train.sh
Please first modify the checkpoint path and annotation path in [test script], then run
bash script/inference/mvbench/test_mvbench.sh
All evaluation scripts can be found here.
For instance, to evaluate the temporal score on VideoChatGPT benchmark, we first run the inference to get prediction results:
bash script/inference/vcgbench/test_temporal.sh
and then execute the corresponding evaluation script to perform benchmarking:
bash script/inference/vcgbench/score_temporal.sh
All testing procedures are identical to VCGbench, where all evaluation scripts are here.
For instance, to evaluate the result on MSVD, we first run
bash script/inference/qabench/msvd_qa.sh
and then run
bash script/inference/qabench/score_msvd.sh