Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
CoRR(2023)
摘要
Multimodal large language models have made significant advancements in recent
years, yet they still suffer from a common issue known as the "hallucination
problem", in which the models generate textual descriptions that inaccurately
depict or entirely fabricate content from associated images. This paper
introduces a novel solution, Hallucination-Aware Direct Preference Optimization
(HA-DPO), which reframes the hallucination problem as a preference selection
task. The model is trained to favor the non-hallucinating response when
presented with two responses of the same image (one accurate and one
hallucinatory). Furthermore, this paper proposes an efficient pipeline for
constructing positive (non-hallucinatory) and negative (hallucinatory) sample
pairs, ensuring a high-quality, style-consistent dataset for robust preference
learning. When applied to three mainstream multimodal models, HA-DPO
significantly reduced hallucination issues and amplified the models'
generalization capabilities. Notably, the MiniGPT-4 model, when enhanced with
HA-DPO, demonstrated a substantial improvement: POPE accuracy rose from 51.13
to 86.13
932.00 to 1326.46 (a relative improvement of 42.32
datasets are made accessible at https://opendatalab.github.io/HA-DPO.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要