Policy Gradient From Demonstration and Curiosity.

IEEE transactions on cybernetics(2023)

Cited 3|Views24
No score
Abstract
With reinforcement learning, an agent can learn complex behaviors from high-level abstractions of the task. However, exploration and reward shaping remain challenging for existing methods, especially in scenarios where extrinsic feedback is sparse. Expert demonstrations have been investigated to solve these difficulties, but a tremendous number of high-quality demonstrations are usually required. In this work, an integrated policy gradient algorithm is proposed to boost exploration and facilitate intrinsic reward learning from only a limited number of demonstrations. We achieved this by reformulating the original reward function with two additional terms, where the first term measured the Jensen-Shannon divergence between current policy and the expert's demonstrations, and the second term estimated the agent's uncertainty about the environment. The presented algorithm was evaluated by a range of simulated tasks with sparse extrinsic reward signals, where only limited demonstrated trajectories were provided to each task. Superior exploration efficiency and high average return were demonstrated in all tasks. Furthermore, it was found that the agent could imitate the expert's behavior and meanwhile sustain high return.
More
Translated text
Key words
Games,Task analysis,Heuristic algorithms,Uncertainty,Trajectory,Predictive models,Training,Curiosity-driven exploration,learn from demonstration,policy gradient,reinforcement learning (RL)
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined