Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching.

Mingliang Liang,Zhuoran Liu,Martha A. Larson

MMM (1)（2023）

引用 0|浏览11

暂无评分

摘要

We propose a Textual Concept Expansion (TCE) approach for creating joint textual-visual embeddings. TCE uses a multi-label classifier that takes a caption as input and produces as output a set of concepts that are used to expand, i.e., enrich the caption. TCE addresses the challenge of the limited number of concepts common between an image and its caption by leveraging general knowledge about the world, i.e., commonsense knowledge. Following a recent trend, the commonsense knowledge is acquired by creative use of the training data. We test TCE within a popular dual-stream approach, Consensus-aware Visual-Semantic Embedding (CVSE). This popular approach leverages a graph that encodes the co-occurrence of concepts, which it takes to represent a consensus between the textual and visual modality that captures commonsense knowledge. Experimental results demonstrate an improvement of image-text matching when TCE is used for the expansion of the background collection and the query. Query expansion, not possible in the original CVSE, is particularly helpful. TCE can be extended in the future to make use of data that is similar to the target domain, but is drawn from an additional, external data set.

查看译文

关键词

textual concept expansion,commonsense knowledge,dual-stream,image-text

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要