Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?
CLeaR(2023)
摘要
Causal confusion is a phenomenon where an agent learns a policy that reflects
imperfect spurious correlations in the data. Such a policy may falsely appear
to be optimal during training if most of the training data contain such
spurious correlations. This phenomenon is particularly pronounced in domains
such as robotics, with potentially large gaps between the open- and closed-loop
performance of an agent. In such settings, causally confused models may appear
to perform well according to open-loop metrics during training but fail
catastrophically when deployed in the real world. In this paper, we study
causal confusion in offline reinforcement learning. We investigate whether
selectively sampling appropriate points from a dataset of demonstrations may
enable offline reinforcement learning agents to disambiguate the underlying
causal mechanisms of the environment, alleviate causal confusion in offline
reinforcement learning, and produce a safer model for deployment. To answer
this question, we consider a set of tailored offline reinforcement learning
datasets that exhibit causal ambiguity and assess the ability of active
sampling techniques to reduce causal confusion at evaluation. We provide
empirical evidence that uniform and active sampling techniques are able to
consistently reduce causal confusion as training progresses and that active
sampling is able to do so significantly more efficiently than uniform sampling.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要