Chrome Extension
WeChat Mini Program
Use on ChatGLM

Robust Asymmetric Learning in POMDPs

International Conference on Machine Learning(2021)

Cited 26|Views5
No score
Abstract
Policies for partially observed Markov decision processes can be efficiently learned by imitating expert policies learned using asymmetric information. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and may therefore encourage actions that are sub-optimal or unsafe under partial information. To address this flaw, we derive an update that, when applied iteratively to an expert, maximizes the expected reward of the trainee's policy. Using this update, we construct a computationally efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and trainee policies. We then show that A2D allows the trainee to safely imitate the modified expert, and outperforms policies learned either by imitating a fixed expert or direct reinforcement learning.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined