Improving Image-Text Matching with Bidirectional Consistency of Cross-Modal Alignment

IEEE Transactions on Circuits and Systems for Video Technology(2024)

引用 0|浏览14
暂无评分
摘要
Image-text matching is a fundamental task in bridging the semantics between vision and language. The key challenge lies in establishing accurate alignment between two heterogeneous modalities. Existing cross-modal fine-grained matching methods normally include two alignment directions, “word to region” and “region to word”, and the overall image-text similarity is calculated from the alignments. However, the alignment of these two directions is typically independent, that is, the alignment of “word to region” and “region to word” is irrelevant, so the alignment consistency cannot be guaranteed in two directions, which inevitably introduces inconsistent alignments, leading to potential inaccurate image-text matching results. In this paper, we propose a novel Bidirectional cOnsistency netwOrks for cross-Modal alignment (BOOM), which achieves more accurate cross-modal semantic alignments by imposing explicit consistency constraints in both directions. Specifically, according to three aspects reflected by alignment consistency, i.e ., significance, wholeness, and alignment orderliness, we design a novel systematic multi-granularity consistency constraints: point-wise consistency, which enforces consistency of the most significant single word item in bidirectional alignments; set-wise consistency, which maintains more comprehensive and accurate bidirectional entire alignment values consistent and order-wise consistency, which ensures order consistency of bidirectional alignment results. Bidirectional cross-modal alignment between words and regions is corrected from three different perspectives: maximum, distribution, and order. Extensive experiments on two benchmarks, i.e ., Flickr30K and MS-COCO, demonstrate that our BOOM achieves state-of-the-art performance.
更多
查看译文
关键词
image-text matching,fine-grained matching,bidirectional consistent,cross-modal alignment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要