Supervised Meta-Reinforcement Learning With Trajectory Optimization for Manipulation Tasks

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS(2024)

引用 0|浏览2
暂无评分
摘要
Learning from small amounts of samples with reinforcement learning (RL) is challenging in many tasks, especially, in real-world applications, such as robotics. Meta-RL (meta-RL) has been proposed as an approach to address this problem by generalizing to new tasks through experience from previous similar tasks. However, these approaches generally perform meta-optimization by focusing direct policy search methods on validation samples from adapted policies, thus, requiring large amounts of on-policy samples during meta-training. To this end, we propose a novel algorithm called supervised meta-RL with trajectory optimization (SMRL-TO) by integrating model-agnostic meta-learning (MAML) and iterative LQR (iLQR)-based trajectory optimization. Our approach is designed to provide online supervision for validation samples through iLQR-based trajectory optimization and embed simple imitation learning into the meta-optimization rather than policy gradient steps. This is actually a bi-level optimization that needs to calculate several gradient updates in each meta-iteration, consisting of off-policy RL in the inner loop and online imitation learning in the outer loop. SMRL-TO can achieve significant improvements in sample efficiency without human-provided demonstrations, due to the effective supervision from iLQR-based trajectory optimization. In this article, we describe how to use iLQR-based trajectory optimization to obtain labeled data and then how leverage them to assist the training of meta-learner. Through a series of robotic manipulation tasks, we further show that compared with the previous methods, the proposed approach can substantially improve sample efficiency and achieve better asymptotic performance.
更多
查看译文
关键词
Task analysis,Trajectory optimization,Robots,Heuristic algorithms,Training,Complexity theory,Dynamical systems,Iterative LQR (iLQR),meta learning,reinforcement learning (RL),robotic manipulation,trajectory optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要