Skip to content

Latest commit

 

History

History
174 lines (111 loc) · 5.85 KB

README.md

File metadata and controls

174 lines (111 loc) · 5.85 KB

SymDPO: Breaking the Bottleneck in Multimodal Example Understanding

Github stars GitHub last commit GitHub pull request

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization [Paper]


📚 Introduction

As language models continue to scale, Large Language Models (LLMs) have exhibited remarkable capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs). Inspired by these advancements, researchers have extended these techniques to Large Multimodal Models (LMMs) with ICL capabilities.

However, existing LMMs face a critical issue: they often fail to effectively leverage the visual context in multimodal demonstrations and instead simply follow textual patterns. This highlights a lack of alignment between multimodal demonstrations and model outputs.

To address this issue, we propose Symbol Demonstration Direct Preference Optimization (SymDPO). SymDPO breaks the traditional paradigm of constructing multimodal demonstrations by replacing text answers in examples with random symbols, forcing the model to carefully understand the demonstration images and establish relationships between images and symbols to answer questions correctly.

We validate the effectiveness of SymDPO on multiple benchmarks, demonstrating its ability to enhance the multimodal context understanding of LMMs and improve their ability to answer questions accurately.


📦 Installation

  1. Clone the repository:

    git clone https://github.com/APiaoG/SymDPO.git
    cd SymDPO
  2. Install dependencies:

    conda env create -f environment.yml

🚀 Quick Start

Here’s how to use the model to inference:

  1. Prepare your multimodal examples (including images and text answers).
  2. Download the model weights. (The model weights trained with SymDPO will be released soon.)
  3. Modify the file paths in open_flamingo/scripts/inference.py accordingly.
  4. Run the inference.py file to perform inference.
    conda activate SymDPO
    python open_flamingo/scripts/inference.py

🏋️ Training

1️⃣ SymDPO

To use SymDPO to train the model built upon OpenFlamingo, follow these steps:


Modify the file paths in the file open_flamingo/scripts/train_dpo.sh, then run the following command:

sh open_flamingo/scripts/train_dpo.sh

2️⃣ SFT

We also provide code for SFT. Please follow these steps:


Modify the file paths in the file open_flamingo/scripts/train_sft.sh, then run the following command:

sh open_flamingo/scripts/train_sft.sh

3️⃣ Evaluation

  1. Please download the files required for the following dataset:

    • COCO
    • Flickr-30K
    • VQAv2
    • OKVQA
    • TextVQA
  2. Modify the file paths in the file open_flamingo/scripts/run_eval.sh, then run the following command:

    sh open_flamingo/scripts/run_eval.sh

🧪 Example Usage

Here is a comparison of results between OpenFlamingo-3B-Instruct trained with SymDPO and OpenFlamingo-3B-Instruct not trained with SymDPO:

example

📊 Benchmark Results

Performance comparison across multiple benchmarks:

performence

Detailed results and methodology can be found in the paper.


📧 Contact

If you have any questions or suggestions, feel free to reach out:


📝 Citing

If you found this repository useful, please consider citing:

@misc{jia2024symdpoboostingincontextlearning,
      title={SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization}, 
      author={Hongrui Jia and Chaoya Jiang and Haiyang Xu and Wei Ye and Mengfan Dong and Ming Yan and Ji Zhang and Fei Huang and Shikun Zhang},
      year={2024},
      eprint={2411.11909},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.11909}, 
}
@article{awadalla2023openflamingo,
  title={OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models},
  author={Anas Awadalla and Irena Gao and Josh Gardner and Jack Hessel and Yusuf Hanafy and Wanrong Zhu and Kalyani Marathe and Yonatan Bitton and Samir Gadre and Shiori Sagawa and Jenia Jitsev and Simon Kornblith and Pang Wei Koh and Gabriel Ilharco and Mitchell Wortsman and Ludwig Schmidt},
  journal={arXiv preprint arXiv:2308.01390},
  year={2023}
}