Chrome Extension
WeChat Mini Program
Use on ChatGLM

Fusing differentiable rendering and language-image contrastive learning for superior zero-shot point cloud classification

Jinlong Xie,Long Cheng, Gang Wang, Min Hu,Zaiyang Yu, Minghua Du,Xin Ning

crossref(2024)

Cited 0|Views8
No score
Abstract
Zero-shot point cloud classification involves recognizing categories not encountered during training. Current models often exhibit reduced accuracy on unseen categories without 3D pre-training, emphasizing the need for improved precision and interoperability. We propose a novel approach integrating differentiable rendering with contrastive language–image pre-training. Initially, differentiable rendering autonomously learns representative viewpoints from the data, enabling the transformation of point clouds into multi-view images while preserving key visual information. This transformation facilitates optimized viewpoint selection during training, refining the final feature representation. Features are extracted from the multi-view images and integrated into a global multi-view feature using a cross-attention mechanism. On the textual side, a large language model (LLM) is provided with 3D heuristic prompts to generate 3D-specific text reflecting category-specific traits, from which textual features are derived. The LLM’s extensive pre-trained knowledge enables it to capture abstract notions and categorical features relevant to distinct point cloud categories. Visual and textual features are aligned in a unified embedding space, enabling zero-shot classification. Throughout training, the Structural Similarity Index (SSIM) is integrated into the loss function to encourage the model to discern more distinctive viewpoints, reduce redundancy in multi-view imagery, and enhance computational efficiency. Experimental results on the ModelNet10, ModelNet40, and ScanObjectNN datasets demonstrate classification accuracies of 75.68%, 66.42%, and 52.03%, respectively, surpassing prevailing methods in zero-shot point cloud classification accuracy.
More
Translated text
Key words
Computer vision,Differentiable rendering,Zero-shot learning,Language–image fusion,Point cloud classification
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined