Knowledge-Embedded Mutual Guidance for Visual Reasoning.

IEEE transactions on cybernetics(2024)

引用 0|浏览13
暂无评分
摘要
Visual reasoning between visual images and natural language is a long-standing challenge in computer vision. Most of the methods aim to look for answers to questions only on the basis of the analysis of the offered questions and images. Other approaches treat knowledge graphs as flattened tables to search for the answer. However, there are two major problems with these works: 1) the model disregards the fact that the world we surrounding us interlinks our hearing and speaking of natural language and 2) the model largely ignores the structure of the acrlong KG. To overcome these challenging deficiencies, a model should jointly consider two modalities of vision and language, as well as the rich structural and logical information embedded in knowledge graphs. To this end, we propose a general joint representation learning framework for visual reasoning, namely, knowledge-embedded mutual guidance. It realizes mutual guidance not only between visual data and natural language descriptions but also between knowledge graphs and reasoning models. In addition, it exploits the knowledge derived from the reasoning model to boost knowledge graphs when applying the visual relation detection task. The experimental results demonstrate that the proposed approach performs dramatically better than state-of-the-art methods on two benchmarks for visual reasoning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要