Chrome Extension
WeChat Mini Program
Use on ChatGLM

MRFTrans: Multimodal Representation Fusion Transformer for monocular 3D semantic scene completion

Rongtao Xu, Jiguang Zhang, Jiaxi Sun, Changwei Wang, Yifan Wu,Shibiao Xu,Weiliang Meng, Xiaopeng Zhang

Information Fusion(2024)

Cited 0|Views11
No score
Abstract
The complete understanding of 3D scenes is crucial in robotic visual perception, impacting tasks such as motion planning and map localization. However, due to the limited field of view and scene occlusion constraints of sensors, inferring complete scene geometry and semantic information from restricted observations is challenging. In this work, we propose a novel Multimodal Representation Fusion Transformer framework (MRFTrans) that robustly fuses semantic, geometric occupancy, and depth representations for monocular-image-based scene completion. MRFTrans centers on an affinity representation fusion transformer, integrating geometric occupancy and semantic relationships within a transformer architecture. This integration enables the modeling of long-range dependencies within scenes for inferring missing information. Additionally, we present a depth representation fusion method, efficiently extracting reliable depth knowledge from biased monocular estimates. Extensive experiments demonstrate MRFTrans’s superiority, setting a new benchmark on SemanticKITTI and NYUv2 datasets. It significantly enhances completeness and accuracy, particularly in large structures, movable objects, and scene components with major occlusions. The results underscore the benefits of the affinity-aware transformer and robust depth fusion in monocular-image-based completion.
More
Translated text
Key words
Semantic scene completion,Transformer,Multimodal representation fusion
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined