A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

arxiv(2020)

引用 1|浏览2
暂无评分
摘要
Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the discounted setting, the construction can be simplified to a 2-state MDP with 1-dimensional features, such that learning is impossible even with an infinite amount of data.
更多
查看译文
关键词
wang-foster-kakade
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要