MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders
arxiv(2024)
摘要
Monocular 3D object detection aims for precise 3D localization and
identification of objects from a single-view image. Despite its recent
progress, it often struggles while handling pervasive object occlusions that
tend to complicate and degrade the prediction of object dimensions, depths, and
orientations. We design MonoMAE, a monocular 3D detector inspired by Masked
Autoencoders that addresses the object occlusion issue by masking and
reconstructing objects in the feature space. MonoMAE consists of two novel
designs. The first is depth-aware masking that selectively masks certain parts
of non-occluded object queries in the feature space for simulating occluded
object queries for network training. It masks non-occluded object queries by
balancing the masked and preserved query portions adaptively according to the
depth information. The second is lightweight query completion that works with
the depth-aware masking to learn to reconstruct and complete the masked object
queries. With the proposed object occlusion and completion, MonoMAE learns
enriched 3D representations that achieve superior monocular 3D detection
performance qualitatively and quantitatively for both occluded and non-occluded
objects. Additionally, MonoMAE learns generalizable representations that can
work well in new domains.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要