Pace n oise for e xploration

Matthias Plappert, Rein Houthooft, Prafulla Dhariwal,Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour,Pieter Abbeel, Marcin Andrychowicz

semanticscholar(2018)

引用 0|浏览0
暂无评分
摘要
Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent’s parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both offand on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要