Off-Policy Adversarial Imitation Learning For Robotic Tasks With Low-Quality Demonstrations?

APPLIED SOFT COMPUTING(2020)

引用 7|浏览12
暂无评分
摘要
The goal of imitation learning (IL) is to enable the robot to imitate expert behavior given expert demonstrations. Adversarial imitation learning (AIL) is a recent successful IL architecture that has shown significant progress in complex continuous tasks, particularly robotic tasks. However, in most cases, the acquisition of high-quality demonstrations is costly and laborious, which poses a significant challenge for AILs. Although generative adversarial imitation learning (GAIL) and its extensions have shown that they are robust to sub-optimal experts, it is difficult for them to surpass the performance of experts by a large margin. To address this issue, in this paper, we propose a novel off-policy AIL method called robust adversarial imitation learning (RAIL). To enable the agent to significantly outperform a sub-optimal expert providing demonstrations, the hindsight idea of variable reward (VR) is first incorporated into the off-policy AIL framework. Then, a strategy called hindsight copy (HC) of demonstrations is designed to provide the discriminator and trained policy in the AIL framework with different demonstrations to maximize the use of such demonstrations and speed up the learning. Experiments were conducted on two multi-goal robotic tasks to test the proposed method. The results show that our method is not limited to the quality of expert demonstrations and can outperform other IL approaches. (C) 2020 Elsevier B.V. All rights reserved.
更多
查看译文
关键词
Robot learning, Demonstrations, Robust adversarial imitation learning, Hindsight copy, Variable reward
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要