Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors
CoRR(2024)
摘要
Precise manipulation that is generalizable across scenes and objects remains
a persistent challenge in robotics. Current approaches for this task heavily
depend on having a significant number of training instances to handle objects
with pronounced visual and/or geometric part ambiguities. Our work explores the
grounding of fine-grained part descriptors for precise manipulation in a
zero-shot setting by utilizing web-trained text-to-image diffusion-based
generative models. We tackle the problem by framing it as a dense semantic part
correspondence task. Our model returns a gripper pose for manipulating a
specific part, using as reference a user-defined click from a source image of a
visually different instance of the same object. We require no manual grasping
demonstrations as we leverage the intrinsic object geometry and features.
Practical experiments in a real-world tabletop scenario validate the efficacy
of our approach, demonstrating its potential for advancing semantic-aware
robotics manipulation. Web page: https://tsagkas.github.io/click2grasp
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要