LMM hallucination😵 refers to occasional instances where LMMs generate content that appears plausible but deviates from or conflicts with the provided image. LMMs tend to rely more on their own parametric knowledge than on provided visual features, causing them to respond with guesses and generate multimodal hallucinations.
In the MLLM community, we've developed methods for detecting, evaluating, and mitigating hallucinations👍.
- FDPO: Detecting and Preventing Hallucinations in Large Vision Language Models, (Gunjal et al. 2023 AAAI 2024)
- HaELM: Evaluation and Analysis of Hallucination in Large Vision-Language Models, (Wang et al. 2023a)
- HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision-Language Models for Detailed Caption, (Zhai et al. 2023)
- Unified Hallucination Detection for Multimodal Large Language Models, (Chen et al.)
- POPE: Evaluating Object Hallucination in Large Vision-Language Models, (Li et al. EMNLP 2023)
- HaELM: Evaluation and Analysis of Hallucination in Large Vision-Language Models, (Wang et al. 2023a)
- HallusionBench: An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Model, (Liu et al. 2023, CVPR2024)
- NOPE: Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models, (Lovenia et al.)
- Bingo: Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges, (Cui et al.)
- FaithScore: Evaluating Hallucinations in Large Vision-Language Models, (Jing et al.)
- AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation, (Wang et al.)
- Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models, (Villa et al.)
- Visually Dehallucinative Instruction Generation: Know What You Don't Know, (Cha et al.)
- The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs, (Han et al.)
- Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models, (Jiang et al.)
- Visual Hallucinations of Multi-modal Large Language Models, (Huang et al.)
- PhD: A Prompted Visual Hallucination Evaluation Dataset, (Liu et al.)
- LRV-Instruction: Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning, (Liu et al. ICLR2024)
- LURE: Analyzing and Mitigating Object Hallucination in Large Vision-Language Models, (Zhou et al. ICLR2024)
- HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision-Language Models for Detailed Caption, (Zhai et al. 2023)
- Woodpecker: Hallucination Correction for Multimodal Large Language Models, (Yin et al.)
- LLaVA-RLHF: Aligning Large Multimodal Models with Factually Augmented RLHF, (Sun et al.)
- Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision, (Lee et al., NAACL 2024)
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data, (Yu et al., CVPR2024)
- VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding, (Leng et al., CVPR 2024)
- HA-DPO: Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
- Mitigating Hallucination in Visual Language Models with Visual Supervision, (Chen et al.)
- OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation, (Huang et al. CVPR 2024)
- FOHE: Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites, (Wang et al.)
- RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback, (CVPR 2024)
- MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations, (Ben-Kish et al.)
- HACL: Hallucination Augmented Contrastive Learning for Multimodal Large Language Model, (Jiang et al.)
- Silkie: Preference Distillation for Large Visual Language Models, (Li et al.)
- MMCot: Multimodal Chain-of-Thought Reasoning in Language Models, (Zhang et al.)
- KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning, (Mondal et al. AAAI 2024)
- Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance, (Zhao et al.)
- ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling, (Yan et al.)
- EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models, (Xing et al.)
- Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models, (Han et al.)
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective, (Yue et al.)
- overly detailed training data can lead to model output beyond visual perception limits, thus exhibiting hallucinations
- propose a learning objective that reduces hallucinations by learning from regular instruction data
- propose a data filtering strategy that prevents harmful training data from exacerbating model hallucinations
- Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding, (Deng et al.)
- Navigating Hallucinations for Reasoning of Unintentional Activities, (Grover et al.)
- IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding, (Zhu et al.)
- Pensieve: Retrospect-then-Compare mitigates Visual Hallucination, (Yang et al.)
- M3ID: Multi-Modal Hallucination Control by Visual Information Grounding, (Favero et al. CVPR 2024)
- What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models, (Kim et al.)
- Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning, (Park et al.)