Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model
CoRR(2024)
摘要
The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot
capability and flexible geometric prompting in general image segmentation.
However, SAM often struggles when handling various unconventional images, such
as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a
ConditionAl Tuning network that adapts SAM toward various unconventional target
tasks with just few-shot target samples. CAT-SAM freezes the entire SAM and
adapts its mask decoder and image encoder simultaneously with a small number of
learnable parameters. The core design is a prompt bridge structure that enables
decoder-conditioned joint tuning of the heavyweight image encoder and the
lightweight mask decoder. The bridging maps the prompt token of the mask
decoder to the image encoder, fostering synergic adaptation of the encoder and
the decoder with mutual benefits. We develop two representative tuning
strategies for the image encoder which leads to two CAT-SAM variants: one
injecting learnable prompt tokens in the input space and the other inserting
lightweight adapter networks. Extensive experiments over 11 unconventional
tasks show that both CAT-SAM variants achieve superior target segmentation
performance consistently even under the very challenging one-shot adaptation
setup. Project page: https://xiaoaoran.github.io/projects/CAT-SAM
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要