SymDPO: Breaking the Bottleneck in Multimodal Example Understanding

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization [Paper]

📚 Introduction

As language models continue to scale, Large Language Models (LLMs) have exhibited remarkable capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs). Inspired by these advancements, researchers have extended these techniques to Large Multimodal Models (LMMs) with ICL capabilities.

However, existing LMMs face a critical issue: they often fail to effectively leverage the visual context in multimodal demonstrations and instead simply follow textual patterns. This highlights a lack of alignment between multimodal demonstrations and model outputs.

To address this issue, we propose Symbol Demonstration Direct Preference Optimization (SymDPO). SymDPO breaks the traditional paradigm of constructing multimodal demonstrations by replacing text answers in examples with random symbols, forcing the model to carefully understand the demonstration images and establish relationships between images and symbols to answer questions correctly.

We validate the effectiveness of SymDPO on multiple benchmarks, demonstrating its ability to enhance the multimodal context understanding of LMMs and improve their ability to answer questions accurately.

📦 Installation

Clone the repository:

git clone https://github.com/APiaoG/SymDPO.git
cd SymDPO

Install dependencies:
```
conda env create -f environment.yml
```

🚀 Quick Start

Here’s how to use the model to inference:

Prepare your multimodal examples (including images and text answers).
Download the model weights. (The model weights trained with SymDPO will be released soon.)
Modify the file paths in open_flamingo/scripts/inference.py accordingly.

Run the inference.py file to perform inference.

conda activate SymDPO
python open_flamingo/scripts/inference.py

🏋️ Training

1️⃣ SymDPO

To use SymDPO to train the model built upon OpenFlamingo, follow these steps:

Modify the file paths in the file open_flamingo/scripts/train_dpo.sh, then run the following command:

sh open_flamingo/scripts/train_dpo.sh

2️⃣ SFT

We also provide code for SFT. Please follow these steps:

Modify the file paths in the file open_flamingo/scripts/train_sft.sh, then run the following command:

sh open_flamingo/scripts/train_sft.sh

3️⃣ Evaluation

Please download the files required for the following dataset:
- COCO
- Flickr-30K
- VQAv2
- OKVQA
- TextVQA
Modify the file paths in the file open_flamingo/scripts/run_eval.sh, then run the following command:
```
sh open_flamingo/scripts/run_eval.sh
```

🧪 Example Usage

Here is a comparison of results between OpenFlamingo-3B-Instruct trained with SymDPO and OpenFlamingo-3B-Instruct not trained with SymDPO:

📊 Benchmark Results

Performance comparison across multiple benchmarks:

Detailed results and methodology can be found in the paper.

📧 Contact

If you have any questions or suggestions, feel free to reach out:

GitHub Issues: Create an issue
Email: [email protected]

📝 Citing

If you found this repository useful, please consider citing:

@misc{jia2024symdpoboostingincontextlearning,
      title={SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization}, 
      author={Hongrui Jia and Chaoya Jiang and Haiyang Xu and Wei Ye and Mengfan Dong and Ming Yan and Ji Zhang and Fei Huang and Shikun Zhang},
      year={2024},
      eprint={2411.11909},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.11909}, 
}

@article{awadalla2023openflamingo,
  title={OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models},
  author={Anas Awadalla and Irena Gao and Josh Gardner and Jack Hessel and Yusuf Hanafy and Wanrong Zhu and Kalyani Marathe and Yonatan Bitton and Samir Gadre and Shiori Sagawa and Jenia Jitsev and Simon Kornblith and Pang Wei Koh and Gabriel Ilharco and Mitchell Wortsman and Ludwig Schmidt},
  journal={arXiv preprint arXiv:2308.01390},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
image/README		image/README
open_flamingo.egg-info		open_flamingo.egg-info
open_flamingo		open_flamingo
trl		trl
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
_optim_utils.py		_optim_utils.py
environment.yml		environment.yml
requirements-eval.txt		requirements-eval.txt
requirements-training.txt		requirements-training.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SymDPO: Breaking the Bottleneck in Multimodal Example Understanding

📚 Introduction

📦 Installation

🚀 Quick Start

🏋️ Training

1️⃣ SymDPO

2️⃣ SFT

3️⃣ Evaluation

🧪 Example Usage

📊 Benchmark Results

📧 Contact

📝 Citing

About

Releases

Packages

Languages

APiaoG/SymDPO

Folders and files

Latest commit

History

Repository files navigation

SymDPO: Breaking the Bottleneck in Multimodal Example Understanding

📚 Introduction

📦 Installation

🚀 Quick Start

🏋️ Training

1️⃣ SymDPO

2️⃣ SFT

3️⃣ Evaluation

🧪 Example Usage

📊 Benchmark Results

📧 Contact

📝 Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages