RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM
arxiv(2024)
摘要
Multi-modal 3D object detectors are dedicated to exploring secure and
reliable perception systems for autonomous driving (AD). However, while
achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they
tend to overlook the complexity and harsh conditions of real-world
environments. Meanwhile, with the emergence of visual foundation models (VFMs),
opportunities and challenges are presented for improving the robustness and
generalization of multi-modal 3D object detection in autonomous driving.
Therefore, we propose RoboFusion, a robust framework that leverages VFMs like
SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the
original SAM for autonomous driving scenarios named SAM-AD. To align SAM or
SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the
image features extracted by SAM. We employ wavelet decomposition to denoise the
depth-guided images for further noise reduction and weather interference.
Lastly, we employ self-attention mechanisms to adaptively reweight the fused
features, enhancing informative features while suppressing excess noise. In
summary, our RoboFusion gradually reduces noise by leveraging the
generalization and robustness of VFMs, thereby enhancing the resilience of
multi-modal 3D object detection. Consequently, our RoboFusion achieves
state-of-the-art performance in noisy scenarios, as demonstrated by the KITTI-C
and nuScenes-C benchmarks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要