DAGNet: Depth-aware Glass-like objects segmentation via cross-modal attention

Yingcai Wan,Qiankun Zhao,Jiqian Xu, Huaizheng Wang,Lijin Fang

Journal of Visual Communication and Image Representation（2024）

引用 0|浏览1

暂无评分

摘要

Transparent or specular objects, such as mirrors, glass windows, and glass walls, have a significant impact on computer vision tasks. Glass-like Objects (GLOS) encompass transparent or specular objects that lack distinctive visual appearances and specific external shapes, posing challenges for GLO segmentation. In this study, we propose a novel bidirectional cross-modal fusion framework with a shift-window cross-attention for GLO segmentation. The framework incorporates a Feature Exchange Module (FEM) and a Shifted-window Cross-attention Feature Fusion Module (SW-CAFM) in each transformer block stage to calibrate, exchange, and fuse cross-modal features. The FEM employs coordinate and spatial attention mechanisms to filter out the noise and recalibrate the features from two modalities. The Shifted-Window Cross-Modal Attention Fusion (SW-CAFM) uses cross-attention to fuse RGB and depth features, leveraging the shifted-window self-attention operation to reduce the computational complexity of cross-attention. The experimental results demonstrate the feasibility and high performance of the proposed method, achieving state-of-the-art results on various glass and mirror benchmarks. The method achieves mIoU accuracies of 90.32%, 94.24%, 88.76%, and 87.47% on the GDD, Trans10K, MSD, and RGBD-Mirror datasets, respectively.

查看译文

关键词

Semantic segmentation,Transparent,Cross-modal,Self-attention

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要