Quaternion Representation Learning for cross-modal matching

Knowledge-Based Systems(2023)

引用 68|浏览48
暂无评分
摘要
The main challenge of cross-modal matching is to construct a shared subspace reflecting semantic closeness. Asymmetric relevance, especially the one-to-many matching case where multiple correspondences of a given query from the other modality can be obtained, has seriously exacerbated this difficulty. Recently, many approaches based on deep metric learning and probability distribution fitting have made great progress. However, these methods cannot generalize well to other approaches for their too many hyper-parameters or uncontrollable instability. In addition, we argue that the representational ability of common space learned in real space is insufficient and the symmetric similarity calculations adopted by the previous work fail to capture the asymmetric relevance. To seek a remedy for these problems, this work introduces a novel and effective approach called Quaternion Representation Learning (QRL) for better cross-modal matching. Specifically, benefiting from the strong expressive power and richer representation capability of quaternion space with three imaginary components, our proposed QRL method can better represent the shared semantic. Due to the inherent asymmetry of the Hamilton product and its latent inter-dependencies of components, our QRL method can model the asymmetric relevance and capture the complex similarity among intra- and inter-modal interaction. Another advantage is that our QRL method can be utilized in conjunction with other existing methods to promote cross-modal matching. Extensive experiments on two commonly used image-text matching benchmarks, i.e., MSCOCO and Flickr30K, and two widely used video-text retrieval datasets, including MSRVTT and TGIF, demonstrate the effectiveness and superiority of our proposed QRL method.
更多
查看译文
关键词
matching,learning,cross-modal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要