A dataset to answer visual questions about named entities

TRAITEMENT AUTOMATIQUE DES LANGUES(2022)

引用 0|浏览1
暂无评分
摘要
In the context of multimodal processing, we focus our work on Knowledge-based Visual Question Answering about named Entities (KVQAE). We provide ViQuAE, a novel dataset of 3,700 questions paired with images, annotated using a semi-automatic method. It is the first KVQAE dataset to cover a wide range of entity types, associated with a knowledge base composed of 1.5M Wikipedia articles paired with images. To set a baseline on the benchmark, we address KVQAE as a three-stage problem: initial Information Retrieval, Re-Ranking, and Reading Comprehension. The experiments empirically demonstrate the difficulty of the task and pave the way towards better multimodal entity representations.
更多
查看译文
关键词
Dataset,Knowledge-based Visual Question Answering,Multimodality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要