Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation
CoRR(2024)
摘要
Robot learning of manipulation skills is hindered by the scarcity of diverse,
unbiased datasets. While curated datasets can help, challenges remain in
generalizability and real-world transfer. Meanwhile, large-scale "in-the-wild"
video datasets have driven progress in computer vision through self-supervised
techniques. Translating this to robotics, recent works have explored learning
manipulation skills by passively watching abundant videos sourced online.
Showing promising results, such video-based learning paradigms provide scalable
supervision while reducing dataset bias. This survey reviews foundations such
as video feature representation learning techniques, object affordance
understanding, 3D hand/body modeling, and large-scale robot resources, as well
as emerging techniques for acquiring robot manipulation skills from
uncontrolled video demonstrations. We discuss how learning only from observing
large-scale human videos can enhance generalization and sample efficiency for
robotic manipulation. The survey summarizes video-based learning approaches,
analyses their benefits over standard datasets, survey metrics, and benchmarks,
and discusses open challenges and future directions in this nascent domain at
the intersection of computer vision, natural language processing, and robot
learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要