HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
CoRR(2023)
摘要
We introduce HallusionBench, a comprehensive benchmark designed for the
evaluation of image-context reasoning. This benchmark presents significant
challenges to advanced large visual-language models (LVLMs), such as
GPT-4V(Vision), Gemini Pro Vision, and LLaVA-1.5, by emphasizing nuanced
understanding and interpretation of visual data. The benchmark comprises 346
images paired with 1129 questions, all meticulously crafted by human experts.
We introduce a novel structure for these visual questions designed to establish
control groups. This structure enables us to conduct a quantitative analysis of
the models' response tendencies, logical consistency, and various failure
modes. In our evaluation on HallusionBench, we benchmarked 14 different models,
highlighting a 31.42
GPT-4V. Notably, all other evaluated models achieve accuracy below 16
Moreover, our analysis not only highlights the observed failure modes,
including language hallucination and visual illusion, but also deepens an
understanding of these pitfalls. Our comprehensive case studies within
HallusionBench shed light on the challenges of hallucination and illusion in
LVLMs. Based on these insights, we suggest potential pathways for their future
improvement. The benchmark and codebase can be accessed at
https://github.com/tianyi-lab/HallusionBench.
更多查看译文
关键词
hallusionbench,benchmark,image-context,multi-modality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要