WeChat Mini Program
Old Version Features

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

CVPR 2024(2024)

Cited 30|Views62
Key words
Vision-language Models,Input Image,Single Image,Ability Of The Model,Multiple Images,Language Model,Adaptive Technique,Description Task,Lack Of Datasets,Adaptive Sampling,Single Input Image,Image Features,Validation Set,Object Detection,Visual Features,Global Features,Image Object,Bounding Box,Question Answering,Vision Tasks,Height Images,Width Of The Image,Visual Question Answering,Image Captioning,Reasoning Tasks,Image Embedding,Image Encoder,Serialized,Language Tasks
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined