State-Dependent Exploration For Policy Gradient Methods

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS(2008)

引用 29|浏览0
暂无评分
摘要
Policy Gradient methods are model-free reinforcement learning algorithms which in recent years have been successfully applied to many real-world problems. Typically, Likelihood Ratio (LR) methods are used to estimate the gradient, but they suffer from high variance due to random exploration at every time step of each training episode. Our solution to this problem is to introduce a state-dependent exploration function (SDE) which during an episode returns the same action for any given state. This results in less variance per episode and faster convergence. SIDE also finds solutions overlooked by other methods, and even improves upon state-of-the-art gradient estimators such as Natural Actor-Critic. We systematically derive SDE and apply it to several illustrative toy problems and a challenging robotics simulation task, where SDE greatly outperforms random exploration.
更多
查看译文
关键词
random exploration,derive SDE,state-dependent exploration function,training episode,state-of-the-art gradient estimator,Likelihood Ratio,Natural Actor-Critic,Policy Gradient method,challenging robotics,faster convergence,Policy Gradient Methods,State-Dependent Exploration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要