Multimodal Encoders for Food-Oriented Cross-Modal Retrieval

WEB AND BIG DATA, APWEB-WAIM 2021, PT II（2021）

引用 2|浏览3

暂无评分

摘要

The task of retrieving across different modalities plays a critical role in food-oriented applications. Modality alignment remains a challenging component in the whole process, in which a common embedding feature space between two modalities can be learned for effective comparison and retrieval. Recent studies mainly utilize adversarial loss or reconstruction loss to align different modalities. However, insufficient features may be extracted from different modalities, resulting in low quality of alignments. Unlike these methods, this paper proposes a method combining multimodal encoders with adversarial learning to learn improved and efficient cross-modal embeddings for retrieval purposes. The core of our proposed approach is the directional pairwise cross-modal attention that latently adapts representations from one modality to another. Although the model is not particularly complex, experimental results on the benchmark Recipe1M dataset show that our proposed method is superior to current state-of-the-art methods.

查看译文

关键词

Food-oriented computing, Cross-modal retrieval, Multimodal encoders, Modality alignment

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要