Chrome Extension
WeChat Mini Program
Use on ChatGLM

Task Bias in Contrastive Vision-Language Models

International Journal of Computer Vision(2023)

Cited 0|Views3
No score
Abstract
Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision. We conduct an in-depth exploration of the CLIP model and show that its visual representation is often strongly biased towards solving some tasks more than others. Moreover, which task the representation will be biased towards is unpredictable, with little consistency across images. To resolve this task bias, we show how to learn a ‘task guidance token’ that can be appended to the input to prompt the representation towards features relevant to their task of interest. Our results show that this task guidance can be independent of the input image and still effectively provide a conditioning mechanism to steer visual representations towards the desired task.
More
Translated text
Key words
Multitask,Task guidance,Representation learning,Zero-shot learning,Vision-language modeling
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined