USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
CVPR 2024(2024)
Abstract
The open-vocabulary image segmentation task involves partitioning images into
semantically meaningful segments and classifying them with flexible
text-defined categories. The recent vision-based foundation models such as the
Segment Anything Model (SAM) have shown superior performance in generating
class-agnostic image segments. The main challenge in open-vocabulary image
segmentation now lies in accurately classifying these segments into
text-defined categories. In this paper, we introduce the Universal Segment
Embedding (USE) framework to address this challenge. This framework is
comprised of two key components: 1) a data pipeline designed to efficiently
curate a large amount of segment-text pairs at various granularities, and 2) a
universal segment embedding model that enables precise segment classification
into a vast range of text-defined categories. The USE model can not only help
open-vocabulary image segmentation but also facilitate other downstream tasks
(e.g., querying and ranking). Through comprehensive experimental studies on
semantic segmentation and part segmentation benchmarks, we demonstrate that the
USE framework outperforms state-of-the-art open-vocabulary segmentation
methods.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined