MonoDet-K: A Monocular 3D Object Detector on BEV with Keypoint Regression

2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE)(2023)

引用 0|浏览0
暂无评分
摘要
Autonomous driving necessitates solving the challenging task of monocular 3D object detection, where the goal is to infer accurate 3D predictions solely from a single 2D image. Existing approaches typically rely on conventional 2D object detectors to localize objects based on their centers and predict 3D attributes using neighboring features around these centers. However, this local feature-based approach falls short in comprehending scene-level 3D spatial structures and neglects crucial inter-object depth relations inferred from contextual cues. At the same time, Transformer-based methods usually ignore geometry relationship. In response, this paper introduces MonoDet-K, a transformer based 3D detector with camera-based keypoint regression. In this network, 3D object candidates are represented as a set of queries, and a depth-aware encoder uses attention mechanisms to generate non-local depth embeddings of the input image. Our proposed depth-guided decoder incorporates depth cross-attention modules, facilitating both inter-query and query-scene depth feature interactions. Consequently, each object query can adaptively estimate its 3D attributes by leveraging depth-guided regions from the image, unbound by the limitation of relying solely on neighboring visual features. MonoDet-K is designed as an end-to-end network, eliminating the need for additional data. Remarkably, it achieves state-of-the-art performance on the KITTI benchmark, showcasing significant improvements. Through extensive ablation studies, we demonstrate the efficacy of our approach, underscoring its potential to serve as a transformer-based baseline for future research in monocular 3D object detection. The proposed framework represents a substantial step forward in addressing the challenges of monocular 3D object detection and opens new avenues for advancing this critical field in autonomous driving.
更多
查看译文
关键词
3D object detection,Transformer,autodrive,deep-learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要