CommVQA: Situating Visual Question Answering in Communicative Contexts
CoRR(2024)
摘要
Current visual question answering (VQA) models tend to be trained and
evaluated on image-question pairs in isolation. However, the questions people
ask are dependent on their informational needs and prior knowledge about the
image content. To evaluate how situating images within naturalistic contexts
shapes visual questions, we introduce CommVQA, a VQA dataset consisting of
images, image descriptions, real-world communicative scenarios where the image
might appear (e.g., a travel website), and follow-up questions and answers
conditioned on the scenario. We show that CommVQA poses a challenge for current
models. Providing contextual information to VQA models improves performance
broadly, highlighting the relevance of situating systems within a communicative
scenario.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要