DAGNet: Depth-aware Glass-like objects segmentation via cross-modal attention

Journal of Visual Communication and Image Representation(2024)

引用 0|浏览1
暂无评分
摘要
Transparent or specular objects, such as mirrors, glass windows, and glass walls, have a significant impact on computer vision tasks. Glass-like Objects (GLOS) encompass transparent or specular objects that lack distinctive visual appearances and specific external shapes, posing challenges for GLO segmentation. In this study, we propose a novel bidirectional cross-modal fusion framework with a shift-window cross-attention for GLO segmentation. The framework incorporates a Feature Exchange Module (FEM) and a Shifted-window Cross-attention Feature Fusion Module (SW-CAFM) in each transformer block stage to calibrate, exchange, and fuse cross-modal features. The FEM employs coordinate and spatial attention mechanisms to filter out the noise and recalibrate the features from two modalities. The Shifted-Window Cross-Modal Attention Fusion (SW-CAFM) uses cross-attention to fuse RGB and depth features, leveraging the shifted-window self-attention operation to reduce the computational complexity of cross-attention. The experimental results demonstrate the feasibility and high performance of the proposed method, achieving state-of-the-art results on various glass and mirror benchmarks. The method achieves mIoU accuracies of 90.32%, 94.24%, 88.76%, and 87.47% on the GDD, Trans10K, MSD, and RGBD-Mirror datasets, respectively.
更多
查看译文
关键词
Semantic segmentation,Transparent,Cross-modal,Self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要