Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
CoRR(2024)
Abstract
Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search,
which aims to break the barriers between modality and language simultaneously
and achieves image-text retrieval in the multi-lingual scenario with a single
model. In recent years, excellent progress has been made based on cross-lingual
cross-modal pre-training; particularly, the methods based on contrastive
learning on large-scale data have significantly improved retrieval tasks.
However, these methods directly follow the existing pre-training methods in the
cross-lingual or cross-modal domain, leading to two problems of inconsistency
in CCR: The methods with cross-lingual style suffer from the intra-modal error
propagation, resulting in inconsistent recall performance across languages in
the whole dataset. The methods with cross-modal style suffer from the
inter-modal optimization direction bias, resulting in inconsistent rank across
languages within each instance, which cannot be reflected by Recall@K. To solve
these problems, we propose a simple but effective 1-to-K contrastive learning
method, which treats each language equally and eliminates error propagation and
optimization bias. In addition, we propose a new evaluation metric, Mean Rank
Variance (MRV), to reflect the rank inconsistency across languages within each
instance. Extensive experiments on four CCR datasets show that our method
improves both recall rates and MRV with smaller-scale pre-trained data,
achieving the new state-of-art.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined