Value Prediction Network.

Oh, Junhyuk,Singh, Satinder,Lee, Honglak

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017)（2017）

Cited 392|Views528

No score

Abstract

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Translated text

Key words

Reinforcement Learning,Deep Learning,Support Vector Machines,Online Learning,Adaptive Algorithms

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined