谷歌浏览器插件
订阅小程序
在清言上使用

Semi-supervised cross-modal retrieval with graph-based semantic alignment network

Computers and Electrical Engineering(2022)

引用 0|浏览11
暂无评分
摘要
Semi-supervised cross-modal retrieval is an eclectic paradigm which learns common representations via exploiting underlying semantic information from both labeled and unlabeled data. Most existing methods ignore the rich semantic information of text data and are unable to fully utilize the text data in common representation learning. Moreover, they only considered the correlation of the data with the same semantic label, but ignored the correlation between the data with different semantic label. In this paper, we propose a novel semi-supervised cross-modal retrieval method, called Graph-based Semantic Alignment Network (GSAN), which learns common representation by aligning the features of different modalities with semantic embeddings of text data. Firstly, we design a Deep Supervised Semantic Encoding (DSSE) module to train the semantic projector and label predictor which can exploit the semantic embeddings and the predicted labels from unlabeled data of text modality. Then, GAN-based Bidirectional Fusion (GBF) module is designed to learn the mapping networks of two modalities (image and text). In order to make the mapping networks generate semantically discriminative and modality-invariant representations, we utilize the underlying semantic information exploited by DSSE to construct Graph-based Triplet Constraint (GTC) which can enforce feature embeddings from the semantically-matched (image and text) pairs to be more similar and push those mismatched ones away. By the benefit of fully using of semantic information, our approach can only use fewer label data and achieves the performance of state-of-the-art methods. In addition, since we only utilize the mapping networks trained in GBF module to generate common representations in referring stage, our approach is efficient and time saving in real world application. Extensive experiments on four widely-used datasets show the effectiveness of GSAN.
更多
查看译文
关键词
41A05,41A10,65D05,65D17
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要