Local-enhanced multi-scale aggregation swin transformer for semantic segmentation of high-resolution remote sensing images

INTERNATIONAL JOURNAL OF REMOTE SENSING(2024)

引用 0|浏览5
暂无评分
摘要
Semantic segmentation of remote sensing images is crucial for various practical applications. In the field of deep learning, convolutional neural network (CNN) has been the primary approach for semantic segmentation over the past decade. Recently, Transformer-based models have achieved superior segmentation performance due to their exceptional global modelling capabilities. However, the Transformer-based models tend to focus more on extracting global contextual information, leading to suboptimal performance in segmenting local edges and difficulties in preserving fine-grained details during the patch token downsampling process. Inspired by the local receptive field of CNN, this article proposes a Local-Enhanced Multi-Scale Aggregation Swin Transformer (LMA-Swin) for semantic segmentation of high-resolution remote sensing images. Specifically, we adopt Swin Transformer as main encoder, introduce convolutional blocks as auxiliary encoder, and design a feature modulation module (FMM) to integrate the local contextual modelling ability of CNN into the Transformer backbone. Additionally, we propose a novel cross-aggregation decoder (CAD) to effectively aggregate shallow edge information and deep semantic information, thereby enhancing the discriminative ability for multi-scale objects. On the ISPRS Vaihingen and Potsdam datasets, experimental results illustrate noteworthy improvement in segmentation performance accomplished through the proposed approach. Code: https://github.com/patricklee16/LMA-Swin.
更多
查看译文
关键词
Local enhancement,multi-scale aggregation,feature modulation,semantic segmentation,remote sensing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要