Chrome Extension
WeChat Mini Program
Use on ChatGLM

Enhancing Real-Time Semantic Segmentation with Textual Knowledge of Pre-Trained Vision-Language Model: A Lightweight Approach

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)

Cited 0|Views13
No score
Abstract
In this paper, we present a lightweight method for real-time semantic segmentation models by leveraging the power of pre-trained vision-language models. Our approach incorporates the CLIP text encoder, which provides rich semantic embeddings for text labels, and effectively distills its rich textual knowledge to the segmentation model. The proposed framework integrates the image and text embeddings, enabling visual and textual information alignment. Besides, we introduce learnable prompt embeddings to capture class-specific information and enhance the semantic understanding of the model. To ensure efficient learning, we devise a two-stage training procedure that allows the segmentation backbone to learn from fixed text embeddings in the first stage and optimize the prompt embeddings in the second stage. Extensive experiments and ablation studies demonstrate the effectiveness of our method in significantly improving the performance of the real-time semantic segmentation model.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined