Learning to Ask Informative Sub-Questions for Visual Question Answering
IEEE Conference on Computer Vision and Pattern Recognition(2022)
摘要
VQA (Visual Question Answering) model tends to make incorrect inferences for questions that require reasoning over world knowledge. Recent study has shown that training VQA models with questions that provide lower-level perceptual information along with reasoning questions improves performance. Inspired by this, we propose a novel VQA model that generates questions to actively obtain auxiliary perceptual information useful for correct reasoning. Our model consists of a VQA model for answering questions, a Visual Question Generation (VQG) model for generating questions, and an Info-score model for estimating the amount of information the generated questions contain, which is useful in answering the original question. We train the VQG model to maximize the "informativeness" provided by the Info-score model to generate questions that contain as much information as possible, about the answer to the original question. Our experiments show that by inputting the generated questions and their answers as additional information to the VQA model, it can indeed predict the answer more correctly than the baseline model.
更多查看译文
关键词
informative sub-questions,VQA model,Visual Question Answering,training VQA models,lower-level perceptual information,reasoning questions,auxiliary perceptual information,Visual Question Generation model,Info-score model,generated questions,original question,VQG model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络