谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Learning to align arabic and english text to remote sensing images using transformers

2022 IEEE Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS)(2022)

引用 1|浏览0
暂无评分
摘要
With the rapid increase of multimodal remote sensing (RS) data, cross-modal retrieval maximally benefits us to give us more flexibility in image retrieval tasks. However, image retrieval across different modalities is still an open challenge in RS community. Inspired by the recent achievement of the transformers on natural language processing and computer vision applications, we present a transformer-based method for text-image retrieval tasks, which consists of separate encoders for textual and visual features. Specifically, we adopted Arabic and English captions at the text modality. Afterward, we investigate two paradigms. In the first paradigm. We consider learning each language independently. In the second paradigm, we jointly learned both Arabic and English languages. The experimental results on two cross-modal confirm the promising capabilities of the proposed method.
更多
查看译文
关键词
remote sensing image retrieval,textimage,retrieval,transformers,cross-language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要