Chrome Extension
WeChat Mini Program
Use on ChatGLM

Model-free Reinforcement Learning that Transfers Using Random Reward Features

ICLR 2023(2023)

Cited 0|Views45
No score
Abstract
Favorable reinforcement learning (RL) algorithms should not only be able to synthesize controller for complex tasks, but also transfer across various such tasks. Classical model-free RL algorithms like Q-learning can be made stable, and has the potential to solve complicated tasks individually. However, rewards are key supervision signals in model-free approaches, making it challenging in general to transfer across multiple tasks with different reward functions. On the other hand, model-based RL algorithms, naturally transfers to various reward functions if the transition dynamics are learned well. Unfortunately, model-learning usually suffers from high dimensional observations and/or long horizons due to the challenges of compounding error. In this work, we propose a new way to transfer behaviors across problems with different reward functions that enjoy the best of both worlds. Specifically, we develop a model-free approach that implicitly learns the model without constructing the transition dynamics. This is achieved by using random features to generate reward functions in training, and incorporating model predictive control with open-loop policies in online planning. We show that the approach enables fast adaptation to problems with completely new reward functions, while scaling to high dimensional observations and long horizons. Moreover, our method can easily be trained on large offline datasets, and be quickly deployed on new tasks with good performance, making it more widely applicable than typical model-free and model-based RL methods. We evaluate the superior performance of our algorithm in a variety of RL and robotics domains.
More
Translated text
Key words
Reinforcement Learning,Model-free,Transfer,Random Features
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined