Imitation Learning Method of Multi-quality Expert Data Based on GAIL

Dengmin Xiao, Bo Wang,Zhongqi Sun,Xiao He

2023 China Automation Congress (CAC)(2023)

引用 0|浏览0
暂无评分
摘要
This paper focuses on the imitation learning methods of multi-quality expert data based on Generative Adversarial Imitation Learning (GAIL). The agent is able to acquire high-quality behavioral policies through GAIL by imitating actions from experts and learning the experience distribution instead of the reward function. Considering that multi-quality expert imitation learning can achieve the effect of data augmentation, a novel GAIL-based method named MT-GAIL is proposed for imitation learning. We first define the reliability coefficient of different expert data by calculating the accuracy of corresponding discriminator. Then the reliability coefficient is used as the weight to calculate the reward function that is defined as the sum of the products of the weights and the output of corresponding discriminator. The series of rewards, states and actions are finally fed into the experience pool to train the network of policy builder. We compare the GAIL method through experiments for the cases of single-expert and multi-quality expert trajectories, which shows that the proposed MT-GAIL method is capable of avoiding the worst expert data. The effects of different reward value calculation methods on multi-quality expert data are also conducted to illustrate the distinct advantage of our proposed discriminator output value weighting method.
更多
查看译文
关键词
GAIL,multi-quality expert data,Mujoco,imitation learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要