D-CONFORMER: Deformable Sparse Transformer Augmented Convolution for Voxel-Based 3D Object Detection

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览10
暂无评分
摘要
Although CNN-based and Transformer-based detectors have made impressive improvements in 3D object detection, these two network paradigms suffer from the interference of insufficient receptive field and local detail weakening, which significantly limits the feature extraction performance of the backbone. In this paper, we propose to fuse convolution and transformer, and simultaneously considering the different contributions of non-empty voxels at different positions in 3D space to object detection, it is not consistent with applying standard convolution and transformer directly on voxels. Specifically, we design a novel deformable sparse transformer to perform long-range information interaction on fine-grained local detail semantics aggregated by focal sparse convolution, termed D-Conformer. D-Conformer learns valuable voxels with position-wise in sparse space and can be applied to most voxel-based detectors as a backbone. Extensive experiments demonstrate that our method achieves satisfactory detection results and outperforms state-of-the-art 3D detection methods by a large margin.
更多
查看译文
关键词
3D object detection,deformable sparse transformer,focal sparse convolution,KITTI dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要