Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning
CoRR(2023)
摘要
Action advising endeavors to leverage supplementary guidance from expert
teachers to alleviate the issue of sampling inefficiency in Deep Reinforcement
Learning (DRL). Previous agent-specific action advising methods are hindered by
imperfections in the agent itself, while agent-agnostic approaches exhibit
limited adaptability to the learning agent. In this study, we propose a novel
framework called Agent-Aware trAining yet Agent-Agnostic Action Advising (A7)
to strike a balance between the two. The underlying concept of A7 revolves
around utilizing the similarity of state features as an indicator for
soliciting advice. However, unlike prior methodologies, the measurement of
state feature similarity is performed by neither the error-prone learning agent
nor the agent-agnostic advisor. Instead, we employ a proxy model to extract
state features that are both discriminative (adaptive to the agent) and
generally applicable (robust to agent noise). Furthermore, we utilize behavior
cloning to train a model for reusing advice and introduce an intrinsic reward
for the advised samples to incentivize the utilization of expert guidance.
Experiments are conducted on the GridWorld, LunarLander, and six prominent
scenarios from Atari games. The results demonstrate that A7 significantly
accelerates the learning process and surpasses existing methods (both
agent-specific and agent-agnostic) by a substantial margin. Our code will be
made publicly available.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要