Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization
arxiv(2022)
摘要
Advances in reinforcement learning (RL) often rely on massive compute
resources and remain notoriously sample inefficient. In contrast, the human
brain is able to efficiently learn effective control strategies using limited
resources. This raises the question whether insights from neuroscience can be
used to improve current RL methods. Predictive processing is a popular
theoretical framework which maintains that the human brain is actively seeking
to minimize surprise. We show that recurrent neural networks which predict
their own sensory states can be leveraged to minimise surprise, yielding
substantial gains in cumulative reward. Specifically, we present the Predictive
Processing Proximal Policy Optimization (P4O) agent; an actor-critic
reinforcement learning agent that applies predictive processing to a recurrent
variant of the PPO algorithm by integrating a world model in its hidden state.
Even without hyperparameter tuning, P4O significantly outperforms a baseline
recurrent variant of the PPO algorithm on multiple Atari games using a single
GPU. It also outperforms other state-of-the-art agents given the same
wall-clock time and exceeds human gamer performance on multiple games including
Seaquest, which is a particularly challenging environment in the Atari domain.
Altogether, our work underscores how insights from the field of neuroscience
may support the development of more capable and efficient artificial agents.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要