MENet: Multi-Modal Mapping Enhancement Network for 3D Object Detection in Autonomous Driving

IEEE Transactions on Intelligent Transportation Systems(2024)

引用 0|浏览14
暂无评分
摘要
To achieve more accurate perception performance, LiDAR and camera are gradually chosen to improve 3D object detection simultaneously. However, it is still a non-trivial task to build an effective fusion mechanism, and this is hindering the development of multi-modal based method. Especially, the mapping relationship construction between two modalities is far from fully explored. Canonical cross-modal mapping suffers from failure when the calibration matrix is incorrect, and it also greatly wastes the amount and density of RGB image information. This paper aims to extend the traditional one-to-one alignment relationship between LiDAR and camera. For all projected point clouds, we enhance their cross-modal mapping relationship through aggregating color-texture related feature and shape-contour related feature. Further, a mapping pyramid is proposed to leverage the semantic representation of the image feature at different stages. Based on the above mapping enhancement strategies, our method increases the engagement rate of image. Finally, we design a fusion module based on an attention mechanism to improve the point cloud feature with the auxiliary image feature. Extensive experiments on the KITTI dataset and SUN-RGBD dataset show that our model achieves satisfactory 3D object detection, especially for categories with sparse point clouds compared with other multi-modal fusion networks.
更多
查看译文
关键词
Multi-modal fusion,3D object detection,mapping enhancement,autonomous driving
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要