AffordanceLLM: Grounding Affordance from Vision Language Models
Computer Vision and Pattern Recognition(2024)
Key words
Language Model,Vision-language Models,Training Set,Object Affordances,Rich World,Similarity Measure,Single Image,Training Images,Object Classification,Latent Space,Depth Map,Additional Input,Vision Tasks,3D Information,Training Objective,Depth Estimation,3D Geometry,Machine Vision,World Knowledge,Image Encoder,Tokenized,Text Query,Special Token,Object Labels
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined