On Oracle-Efficient PAC Reinforcement Learning with Rich Observations.

arXiv: Learning(2018)

引用 85|浏览196
暂无评分
摘要
We study the computational tractability of provably sample-efficient (PAC) reinforcement learning in episodic environments with rich observations. We present new sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation — accessing policy and value function classes exclusively through standard optimization primitives — and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. In the more general stochastic transition setting, we prove that the only known sample-efficient algorithm, Olive [1], cannot be implemented in our oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要