Chrome Extension
WeChat Mini Program
Use on ChatGLM

X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks

AAAI 2024(2024)

Cited 0|Views9
No score
Abstract
Referring 3D instance segmentation is a challenging task aimed at accurately segmenting a target instance within a 3D scene based on a given referring expression. However, previous methods have overlooked the distinct roles played by different words in referring expressions. Additionally, they have failed to incorporate the positional relationship within referring expressions with the spatial correlations in 3D scenes. To alleviate these issues, we present a novel model called X-RefSeg3D, which constructs a cross-modal graph for the input 3D scene and unites textual and spatial relationships for reasoning via graph neural networks. Our approach begins by capturing object-specific text features, which are then fused with the instance features to construct a comprehensive cross-modal scene graph. Subsequently, we integrate the obtained cross-modal features into graph neural networks, leveraging the K-nearest algorithm to derive explicit instructions from expressions and factual relationships in scenes. This enables the effective capture of higher-order relationships among instances, thereby enhancing feature fusion and facilitating reasoning. Finally, the refined feature undergoes a matching module to compute the ultimate matching score. Experimental results on ScanRefer demonstrate the effectiveness of our method, surpassing previous approaches by a substantial margin of +3.67% in terms of mIOU.
More
Translated text
Key words
CV: 3D Computer Vision,CV: Language and Vision
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined