Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks
arxiv(2024)
摘要
Robot arms should be able to learn new tasks. One framework here is
reinforcement learning, where the robot is given a reward function that encodes
the task, and the robot autonomously learns actions to maximize its reward.
Existing approaches to reinforcement learning often frame this problem as a
Markov decision process, and learn a policy (or a hierarchy of policies) to
complete the task. These policies reason over hundreds of fine-grained actions
that the robot arm needs to take: e.g., moving slightly to the right or
rotating the end-effector a few degrees. But the manipulation tasks that we
want robots to perform can often be broken down into a small number of
high-level motions: e.g., reaching an object or turning a handle. In this paper
we therefore propose a waypoint-based approach for model-free reinforcement
learning. Instead of learning a low-level policy, the robot now learns a
trajectory of waypoints, and then interpolates between those waypoints using
existing controllers. Our key novelty is framing this waypoint-based setting as
a sequence of multi-armed bandits: each bandit problem corresponds to one
waypoint along the robot's motion. We theoretically show that an ideal solution
to this reformulation has lower regret bounds than standard frameworks. We also
introduce an approximate posterior sampling solution that builds the robot's
motion one waypoint at a time. Results across benchmark simulations and two
real-world experiments suggest that this proposed approach learns new tasks
more quickly than state-of-the-art baselines. See videos here:
https://youtu.be/MMEd-lYfq4Y
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要