Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing
CoRR(2024)
摘要
Pretrained large Vision-Language models have drawn considerable interest in
recent years due to their remarkable performance. Despite considerable efforts
to assess these models from diverse perspectives, the extent of visual cultural
awareness in the state-of-the-art GPT-4V model remains unexplored. To tackle
this gap, we extensively probed GPT-4V using the MaRVL benchmark dataset,
aiming to investigate its capabilities and limitations in visual understanding
with a focus on cultural aspects. Specifically, we introduced three visual
related tasks, i.e. caption classification, pairwise captioning, and culture
tag selection, to systematically delve into fine-grained visual cultural
evaluation. Experimental results indicate that GPT-4V excels at identifying
cultural concepts but still exhibits weaker performance in low-resource
languages, such as Tamil and Swahili. Notably, through human evaluation, GPT-4V
proves to be more culturally relevant in image captioning tasks than the
original MaRVL human annotations, suggesting a promising solution for future
visual cultural benchmark construction.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要