CroMIC-QA: The Cross-Modal Information Complementation based Question Answering

IEEE Transactions on Multimedia(2023)

引用 0|浏览17
暂无评分
摘要
This paper proposes a new multi-modal question-answering task, named as Cross-Modal Information Complementation based Question Answering (CroMIC-QA), to promote the exploration on bridging the semantic gap between visual and linguistic signals. The proposed task is inspired by the common phenomenon that, in most user-generated QA scenarios, the information of the given textual question is incomplete, and thus it is required to merge the semantics of both the text and the accompanying image to infer the complete real question. In this work, the CroMIC-QA task is first formally defined and compared with the classic Visual Question Answering (VQA) task. On this basis, a specified dataset, CroMIC-QA-Agri, is collected from an online QA community in the agriculture domain for the proposed task. A group of experiments is conducted on this dataset, with the typical multi-modal deep architectures implemented and compared. The experimental results show that the appropriate text/image presentations and text-image semantic interaction methods are effective to improve the performance of the framework.
更多
查看译文
关键词
Cross-Modal Semantic Interaction,Visual Question Answering,Domain-Specific Datasets,Multi-Modal Tasks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要