Context and Geometry Aware Voxel Transformer for Semantic Scene Completion
CoRR(2024)
摘要
Vision-based Semantic Scene Completion (SSC) has gained much attention due to
its widespread applications in various 3D perception tasks. Existing
sparse-to-dense approaches typically employ shared context-independent queries
across various input images, which fails to capture distinctions among them as
the focal regions of different inputs vary and may result in undirected feature
aggregation of cross-attention. Additionally, the absence of depth information
may lead to points projected onto the image plane sharing the same 2D position
or similar sampling points in the feature map, resulting in depth ambiguity. In
this paper, we present a novel context and geometry aware voxel transformer. It
utilizes a context aware query generator to initialize context-dependent
queries tailored to individual input images, effectively capturing their unique
characteristics and aggregating information within the region of interest.
Furthermore, it extend deformable cross-attention from 2D to 3D pixel space,
enabling the differentiation of points with similar image coordinates based on
their depth coordinates. Building upon this module, we introduce a neural
network named CGFormer to achieve semantic scene completion. Simultaneously,
CGFormer leverages multiple 3D representations (i.e., voxel and TPV) to boost
the semantic and geometric representation abilities of the transformed 3D
volume from both local and global perspectives. Experimental results
demonstrate that CGFormer achieves state-of-the-art performance on the
SemanticKITTI and SSCBench-KITTI-360 benchmarks, attaining a mIoU of 16.87 and
20.05, as well as an IoU of 45.99 and 48.07, respectively. Remarkably, CGFormer
even outperforms approaches employing temporal images as inputs or much larger
image backbone networks. Code for the proposed method is available at
https://github.com/pkqbajng/CGFormer.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要