Diverse Visual Question Generation based on Multiple Objects SelectionJust Accepted

ACM Transactions on Multimedia Computing, Communications, and Applications(2022)

引用 0|浏览0
暂无评分
摘要
Visual question generation task aims at generating high-quality questions about a given image. To make this task applicable to various scenarios, e.g., the growing demand for exams, it is important to generate diverse questions. The existing methods for this task control diverse question generation based on different question types, e.g., “what” and “when”. Although different question types lead to description diversity, they cannot guarantee semantic diversity when asking the same objects. Research in the field of psychology shows that humans pay attention to different objects in an image based on their preferences, which is beneficial to constructing semantically diverse questions. According to the research, we propose a multi-selector visual question generation (MS-VQG) model, which aims to focus on different objects to generate diverse questions. Specifically, our MS-VQG model employs multiple selectors to imitate different humans to select different objects in a given image. Based on these different selected objects, our MS-VQG model can generate diverse questions corresponding to each selector. Extensive experiments on two datasets show that our proposed model outperforms the baselines in generating diverse questions.
更多
查看译文
关键词
Multimodal,Visual Question Generation,Mixture of Experts
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要