Multimodal Knowledge Graph-Guided Cross-Modal Graph Network for Image-Text Retrieval.

IEEE International Conference on Big Data and Smart Computing(2024)

引用 0|浏览6
暂无评分
摘要
Image-text retrieval is a fundamental cross-modal task, which dedicates to align the representation space between image modality and text modality. Existing cross-interactive image-text retrieval methods generate image and sentence em-beddings independently, introduce interaction-based networks for cross-modal reasoning, and then retrieve them using matching metrics. However, existing approaches do not consider fully utilizing semantic relationships among multimodal knowledge to enhance cross-modal fine-grained implicit semantic reasoning capabilities. In this paper, we propose Multimodal Knowledge Graph-guided Cross-modal Graph Network (MKCGN) that exploits multimodal knowledge graphs to explore cross-modal relationships and enhance global representations. In MKCGN, images generate semantic and spatial graphs, which are used to represent visual graphs, and sentences generate textual graphs based on word semantic relations. The visual and textual graphs are used to implement inter-modal reasoning respectively. Then we obtain interest embeddings of image regions and text words based on entity embed dings in Multimodal Knowledge Graph (MKG), which approximates and aligns the representation space of regions and words to a certain extent, thus obtaining effective inter-modal interactions and learning fine-grained cross-modal communication through graph node contrast loss for inter-modal semantic reasoning. Finally, we mine the implicit semantics and potential relationships of images and texts through the MKG as a means of enhancing the global representations and use cross-modal contrast loss to narrow the space of coarse-grained cross-modal representations. Experiments on the MS-COCO and Flickr30K benchmark datasets show that our proposed MKCGN outperforms state-of-the-art image-text retrieval methods.
更多
查看译文
关键词
Multimodal Fusion,Image-Text Retrieval,Mul-timodal Konwledge Graph
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要