This repository contains a comprehensive survey paper titled "Object Hallucinations in Multimodal Large Language Models: A Survey". The paper explores the phenomenon of hallucinations in Multimodal Large Language Models (MLLMs), focusing on their implications, causes, and mitigation strategies.
Multimodal Large Language Models (MLLMs) have shown remarkable advancements in tasks that integrate textual and visual data. However, they often generate outputs that are inconsistent with the provided visual content, raising concerns about their reliability. This survey aims to provide insights into the current research landscape regarding hallucinations in MLLMs, integrating existing knowledge and identifying future research directions.
In-depth Exploration: This survey delivers a profound examination of hallucinations in MLLMs, revealing their critical impact on model reliability and real-world applications. The findings underscore the urgent need for addressing these issues to ensure safe deployment in sensitive domains.
Categorization of Hallucinations:
- Object Category: Incorrect identification of objects.
- Object Attribute: Misalignment of object descriptions with visual content.
- Object Relation: Inaccurate representation of interactions between objects.
- Research Scope: Focuses on hallucinations arising from data, model architecture, training processes, and inference mechanisms.
License This work is licensed under a Creative Commons Attribution 4.0 International License. You may use and share this work as long as appropriate credit is given. For any inquiries or collaborations, feel free to reach out via email. Thank you for your interest in this research!