HFT: Lifting Perspective Representations via Hybrid Feature Transformation for BEV Perception

ICRA(2023)

引用 1|浏览8
暂无评分
摘要
Restoring an accurate Bird's Eye View (BEV) map plays a crucial role in the perception of autonomous driving. The existing works of lifting representations from frontal view to BEV can be classified into two categories, i.e., Camera model-Based Feature Transformation (CBFT) and Camera model-Free Feature Transformation (CFFT). We empirically analyze the significant differences between CBFT and CFFT. The former method lift perspective features based on the flat- world assumption, which often causes distortion of regions lying above the ground plane. The latter method is limited in the perception performance due to the absence of geometric priors and time-consuming computing. In this paper, we propose a novel framework with a Hybrid Feature Transformation module (HFT) to lift perspective representations. Furthermore, we design a mutual learning scheme to augment hybrid transformation. The deformable attention mechanism enables the model to pay more attention to relevant regions and capture features with more semantics. We illustrate the effectiveness of HFT in BEV perception tasks, such as segmentation and object detection. Notably, in the task of semantic segmentation, extensive experiments demonstrate that HFT outperforms the previous state-of-the-art method by relatively 17.9% on the Argoverse and 22.0% on the KITTI 3D Object dataset. With negligible computing budget, HFT outperforms existing image- based methods on 3D object detection. The code will be released soon.
更多
查看译文
关键词
accurate Bird's Eye View,BEV perception tasks,capture features,CBFT,CFFT,frontal view,HFT,Hybrid Feature Transformation module,hybrid transformation,lifting perspective representations,method lift perspective features,perception performance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要