Chrome Extension
WeChat Mini Program
Use on ChatGLM

Beyond Exponentially Discounted Sum: Automatic Learning of Return Function.

CoRR(2019)

Cited 10|Views2
No score
Abstract
In reinforcement learning, Return, which is the weighted accumulated future rewards, and Value, which is the expected return, serve as the objective that guides the learning of the policy. In classic RL, return is defined as the exponentially discounted sum of future rewards. One key insight is that there could be many feasible ways to define the form of the return function (and thus the value), from which the same optimal policy can be derived, yet these different forms might render dramatically different speeds of learning this policy. In this paper, we research how to modify the form of the return function to enhance the learning towards the optimal policy. We propose to use a general mathematical form for return function, and employ meta-learning to learn the optimal return function in an end-to-end manner. We test our methods on a specially designed maze environment and several Atari games, and our experimental results clearly indicate the advantages of automatically learning optimal return functions in reinforcement learning.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined