Learning A Limited Text Space For Cross-Media Retrieval

COMPUTER ANALYSIS OF IMAGES AND PATTERNS(2017)

引用 3|浏览75
暂无评分
摘要
In this paper, we propose a novel model for cross-media retrieval which relies on a limited text space rather than a common space or an image space. More specifically, the model consists of three parts: A visual part that consists of a convolutional neural network and an image understanding network; A language model part that achieves sentence understanding by recurrent neural network; An embedding part that contains a fusion layer to capture both visual label information and semantic correlations between images and sentences, as well as learn the final limited text space by optimizing pairwise ranking loss. Experimental results on three benchmark datasets show that our proposed model gains promising improvement in accuracy for cross-media retrieval especially on sentence retrieval compared with the related state-of-the-art methods.
更多
查看译文
关键词
Cross-media retrieval, Limited text space, Fusion layer, Image understanding network, Recurrent neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要