FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training
Anjia Cao, Xing Wei, Zhiheng Ma
Paper | Model | Data
- [2024/11/28] Model on Hugging Face.
- [2024/11/28] Release evaluation code.
- [2024/11/18] Paper on arXiv.
- 🔥 Leveraging frozen LLMs to naturally process long text inputs.
- 🔥 Generalizing from monolingual training to multilingual evaluation.
- 🔥 Strong improvement on long/short-context image-text retrieval, image classification, and multilingual scenarios.
- Release training code and data.
- Release evaluation code.
- Release pre-trained checkpoints.
git clone https://github.com/MIV-XJTU/FLAME.git
cd FLAME
conda create -n flame python=3.10 -y
conda activate flame
make install
make install-training
make install-test
See Evaluation.md.
Dataset | Model | Checkpoints |
---|---|---|
CC3M | ViT-B/16 | Hugging Face |
CC3M | ViT-L/14 | TODO |
YFCC15M | ViT-B/16 | Hugging Face |
YFCC15M | ViT-L/14 | TODO |
The project is under a standard Creative Common CC-BY-4.0 License.
If you find our work helpful for your research, please consider giving a star and citation.
@article{cao2024flame,
title={FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training},
author={Cao, Anjia and Wei, Xing and Ma, Zhiheng},
journal={arXiv preprint arXiv:2411.11927},
year={2024}
}
This project is based on open_clip, and thanks for the nice work! We also thank CLIP_benchmark, DreamLIP, Long-CLIP, and PromptEOL for their codes.