VRP-SAM: SAM with Visual Reference Prompt
CVPR 2024(2024)
摘要
In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that
empowers the Segment Anything Model (SAM) to utilize annotated reference images
as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM
can utilize annotated reference images to comprehend specific objects and
perform segmentation of specific objects in target image. It is note that the
VRP encoder can support a variety of annotation formats for reference images,
including point, box, scribble, and mask.
VRP-SAM achieves a breakthrough within the SAM framework by extending its
versatility and applicability while preserving SAM's inherent strengths, thus
enhancing user-friendliness. To enhance the generalization ability of VRP-SAM,
the VRP encoder adopts a meta-learning strategy. To validate the effectiveness
of VRP-SAM, we conducted extensive empirical studies on the Pascal and COCO
datasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visual
reference segmentation with minimal learnable parameters. Furthermore, VRP-SAM
demonstrates strong generalization capabilities, allowing it to perform
segmentation of unseen objects and enabling cross-domain segmentation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要