Chrome Extension
WeChat Mini Program
Use on ChatGLM

Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens.

Lv Tang,Peng-Tao Jiang, Zhihao Shen,Hao Zhang, Jinwei Chen,Bo Li

CoRR(2023)

Cited 0|Views14
No score
Abstract
Large Vision-Language Model (LVLM) has seen burgeoning development and increasing attention recently. In this paper, we propose a novel framework, camo-perceptive vision-language framework (CPVLF), to explore whether LVLM can generalize to the challenging camouflaged object detection (COD) scenario in a training-free manner. During the process of generalization, we find that due to hallucination issues within LVLM, it can erroneously perceive objects in camouflaged scenes, producing counterfactual concepts. Moreover, as LVLM is not specifically trained for the precise localization of camouflaged objects, it exhibits a degree of uncertainty in accurately pinpointing these objects. Therefore, we propose chain of visual perception, which enhances LVLM's perception of camouflaged scenes from both linguistic and visual perspectives, reducing the hallucination issue and improving its capability in accurately locating camouflaged objects. We validate the effectiveness of CPVLF on three widely used COD datasets, and the experiments show the potential of LVLM in the COD task.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined