Hallucination of Multimodal Large Language Models: A Survey
arxiv(2024)
摘要
This survey presents a comprehensive analysis of the phenomenon of
hallucination in multimodal large language models (MLLMs), also known as Large
Vision-Language Models (LVLMs), which have demonstrated significant
advancements and remarkable abilities in multimodal tasks. Despite these
promising developments, MLLMs often generate outputs that are inconsistent with
the visual content, a challenge known as hallucination, which poses substantial
obstacles to their practical deployment and raises concerns regarding their
reliability in real-world applications. This problem has attracted increasing
attention, prompting efforts to detect and mitigate such inaccuracies. We
review recent advances in identifying, evaluating, and mitigating these
hallucinations, offering a detailed overview of the underlying causes,
evaluation benchmarks, metrics, and strategies developed to address this issue.
Additionally, we analyze the current challenges and limitations, formulating
open questions that delineate potential pathways for future research. By
drawing the granular classification and landscapes of hallucination causes,
evaluation benchmarks, and mitigation methods, this survey aims to deepen the
understanding of hallucinations in MLLMs and inspire further advancements in
the field. Through our thorough and in-depth review, we contribute to the
ongoing dialogue on enhancing the robustness and reliability of MLLMs,
providing valuable insights and resources for researchers and practitioners
alike. Resources are available at:
https://github.com/showlab/Awesome-MLLM-Hallucination.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要