Learning to align arabic and english text to remote sensing images using transformers
2022 IEEE Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS)(2022)
摘要
With the rapid increase of multimodal remote sensing (RS) data, cross-modal retrieval maximally benefits us to give us more flexibility in image retrieval tasks. However, image retrieval across different modalities is still an open challenge in RS community. Inspired by the recent achievement of the transformers on natural language processing and computer vision applications, we present a transformer-based method for text-image retrieval tasks, which consists of separate encoders for textual and visual features. Specifically, we adopted Arabic and English captions at the text modality. Afterward, we investigate two paradigms. In the first paradigm. We consider learning each language independently. In the second paradigm, we jointly learned both Arabic and English languages. The experimental results on two cross-modal confirm the promising capabilities of the proposed method.
更多查看译文
关键词
remote sensing image retrieval,textimage,retrieval,transformers,cross-language
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要