Towards Robust LiDAR-Camera Fusion in BEV Space via Mutual Deformable Attention and Temporal Aggregation

Jian Wang,Fan Li, Yi An,Xuchong Zhang,Hongbin Sun

IEEE Transactions on Circuits and Systems for Video Technology(2024)

引用 0|浏览8
暂无评分
摘要
LiDAR and camera are two critical sensors that can provide complementary information for accurate 3D object detection. Most works are devoted to improving the detection performance of fusion models on the clean and well-collected datasets. However, the collected point clouds and images in real scenarios may be corrupted to various degrees due to potential sensor malfunctions, which greatly affects the robustness of the fusion model and poses a threat to safe deployment. In this paper, we first analyze the shortcomings of most fusion detectors, which rely mainly on the LiDAR branch, and the potential of the bird’s eye-view (BEV) paradigm in dealing with partial sensor failures. Based on that, we present a robust LiDAR-camera fusion pipeline in unified BEV space with two novel designs under four typical LiDAR-camera malfunction cases. Specifically, a mutual deformable attention is proposed to dynamically model the spatial feature relationship and reduce the interference caused by the corrupted modality, and a temporal aggregation module is devised to fully utilize the rich information in the temporal domain. Together with the decoupled feature extraction for each modality and holistic BEV space fusion, the proposed detector, termed RobBEV, can work stably regardless of single-modality data corruption. Extensive experiments on the large-scale nuScenes dataset under robust settings demonstrate the effectiveness of our approach.
更多
查看译文
关键词
3D object detection,LiDAR-camera fusion,model robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要