Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model
Hao Cheng*, Erjia Xiao*, Jindong Gu, Le Yang, Jinhao Duan, Jize Zhang, Jiahang Cao, Kaidi Xu, Renjing Xu†
HKUST & University of Oxford & Drexel University & Xi’an Jiaotong University
-
Please follow the instructions in LLaVA, InstructBLIP and MiniGPT4 to set up the codebase, model weights and conda environment for further experiments.
-
Download the Typographic Dataset.
-
Clone this repository into the codebase mentioned above. For instance, after installing LLaVA,
cd LLaVA
git clone https://github.com/ChaduCheng/TypoDeceptions.git
- LLaVA: Large Language and Vision Assistant
- MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
- CLIP: Learning Transferable Visual Models From Natural Language Supervision
If you find our work useful for your research and applications, please cite using this BibTeX:
@article{cheng2024unveiling,
title={Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model},
author={Cheng, Hao and Xiao, Erjia and Gu, Jindong and Yang, Le and Duan, Jinhao and Zhang, Jize and Cao, Jiahang and Xu, Kaidi and Xu, Renjing},
journal={ECCV},
year={2024}
}