SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
CoRR(2024)
摘要
The field of autonomous driving has attracted considerable interest in
approaches that directly infer 3D objects in the Bird's Eye View (BEV) from
multiple cameras. Some attempts have also explored utilizing 2D detectors from
single images to enhance the performance of 3D detection. However, these
approaches rely on a two-stage process with separate detectors, where the 2D
detection results are utilized only once for token selection or query
initialization. In this paper, we present a single model termed SimPB, which
simultaneously detects 2D objects in the perspective view and 3D objects in the
BEV space from multiple cameras. To achieve this, we introduce a hybrid decoder
consisting of several multi-view 2D decoder layers and several 3D decoder
layers, specifically designed for their respective detection tasks. A Dynamic
Query Allocation module and an Adaptive Query Aggregation module are proposed
to continuously update and refine the interaction between 2D and 3D results, in
a cyclic 3D-2D-3D manner. Additionally, Query-group Attention is utilized to
strengthen the interaction among 2D queries within each camera group. In the
experiments, we evaluate our method on the nuScenes dataset and demonstrate
promising results for both 2D and 3D detection tasks. Our code is available at:
https://github.com/nullmax-vision/SimPB.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要