On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
NeurIPS 2023(2024)
摘要
We seek to understand what facilitates sample-efficient learning from
historical datasets for sequential decision-making, a problem that is popularly
known as offline reinforcement learning (RL). Further, we are interested in
algorithms that enjoy sample efficiency while leveraging (value) function
approximation. In this paper, we address these fundamental questions by (i)
proposing a notion of data diversity that subsumes the previous notions of
coverage measures in offline RL and (ii) using this notion to unify three
distinct classes of offline RL algorithms based on version spaces (VS),
regularized optimization (RO), and posterior sampling (PS). We establish that
VS-based, RO-based, and PS-based algorithms, under standard assumptions,
achieve comparable sample efficiency, which recovers the
state-of-the-art sub-optimality bounds for finite and linear model classes with
the standard assumptions. This result is surprising, given that the prior work
suggested an unfavorable sample complexity of the RO-based algorithm compared
to the VS-based algorithm, whereas posterior sampling is rarely considered in
offline RL due to its explorative nature. Notably, our proposed model-free
PS-based algorithm for offline RL is novel, with sub-optimality bounds that
are frequentist (i.e., worst-case) in nature.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要