Efficient Transfer Learning for Visual Tasks via Continuous Optimization of Prompts

Jonathan Conder, Josephine Jefferson,Nathan Pages,Khurram Jawed,Alireza Nejati,Mark Sagar

IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I（2022）

引用 3|浏览10

暂无评分

摘要

Traditional methods for adapting pre-trained vision models to downstream tasks involve fine-tuning some or all of the model's parameters. There are a number of trade-offs with this approach. When too many parameters are fine-tuned, the model may lose the benefits associated with pre-training, such as the ability to generalize to out-of-distribution data. But, if instead too few parameters are fine-tuned, the model may be unable to adapt effectively for the tasks downstream. In this paper, we propose Visual Prompt Tuning (VPT) as an alternative to fine-tuning for Transformer-based vision models. Our method is closely related to, and inspired by, prefix-tuning of language models [22]. We find that, by adding additional parameters to a pre-trained model, VPT offers similar performance to fine-tuning the final layer. In addition, for low-data settings and for specialized tasks, such as traffic sign recognition, satellite photo recognition and handwriting classification, the performance of Transformer-based vision models is improved with the use of VPT.

查看译文

关键词

Computer vision, Few-shot, Fine-tuning, Prompt engineering, Prefix-tuning, CLIP, Transformers, Vision transformers

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要