Chrome Extension
WeChat Mini Program
Use on ChatGLM

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

arXiv (Cornell University)(2021)

Cited 0|Views75
No score
Abstract
Autonomous driving perceives the surrounding environment for decision making, which is one of the most complicated scenes for visual perception. The great power of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet is developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), where route planning can be handily performed. In this paradigm, four kinds of modules are conducted in succession with different roles: an image-view encoder for encoding feature in image view, a view transformer for feature transformation from image view to BEV, a BEV encoder for further encoding feature in BEV, and a task-specific head for predicting the targets in BEV. We merely reuse the existing modules for constructing BEVDet and make it feasible for multi-camera 3D object detection by constructing an exclusive data augmentation strategy. The proposed paradigm works well in multi-camera 3D object detection and offers a good trade-off between computing budget and performance. BEVDet with 704x256 (1/8 of the competitors) image size scores 29.4% mAP and 38.4% NDS on the nuScenes val set, which is comparable with FCOS3D (i.e., 2008.2 GFLOPs, 1.7 FPS, 29.5% mAP and 37.2% NDS), while requires merely 12% computing budget of 239.4 GFLOPs and runs 4.3 times faster. Scaling up the input size to 1408x512, BEVDet scores 34.9% mAP, and 41.7% NDS, which requires just 601.4 GFLOPs and significantly suppresses FCOS3D by 5.4% mAP and 4.5% NDS. The superior performance of BEVDet tells the magic of paradigm innovation.
More
Translated text
Key words
3d,detection,high-performance,multi-camera,bird-eye-view
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined