Skip to content

Latest commit

 

History

History
48 lines (41 loc) · 1.46 KB

README.md

File metadata and controls

48 lines (41 loc) · 1.46 KB

In-Image Learning

Code for the paper "All in an Aggregated Image for In-Image Learning".

IIL case

Requirement

pip install -r requirements.txt

Download Dataset

The processed dataset and demonstration examples are available from this link. Unzip the file after downloading and keep the dataset directory in the root directory of the project.

----IIL
    |----dataset
    |----src
    ...

Run In-Image Learning and Baselines

In-Image Learning

python run_iil.py --exp_name exp_on_mv --dataset mathvista --lt few_shot

Visual-text interleaved in-context learning

python run_vticl.py --exp_name exp_on_mv --dataset mathvista --lt few_shot

Text-only in-context learning

python run_ticl.py --exp_name exp_on_mv --dataset mathvista --lt few_shot

Cite

If you find In-Image Learning useful for your research and applications, please kindly cite using this BibTeX:

@misc{wang2024single,
      title={All in a Single Image: Large Multimodal Models are In-Image Learners}, 
      author={Lei Wang and Wanyu Xu and Zhiqiang Hu and Yihuai Lan and Shan Dong and Hao Wang and Roy Ka-Wei Lee and Ee-Peng Lim},
      year={2024},
      eprint={2402.17971},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}