Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets
CoRR(2023)
摘要
We propose policy gradient algorithms for robust infinite-horizon Markov
decision processes (MDPs) with non-rectangular uncertainty sets, thereby
addressing an open challenge in the robust MDP literature. Indeed, uncertainty
sets that display statistical optimality properties and make optimal use of
limited data often fail to be rectangular. Unfortunately, the corresponding
robust MDPs cannot be solved with dynamic programming techniques and are in
fact provably intractable. We first present a randomized projected Langevin
dynamics algorithm that solves the robust policy evaluation problem to global
optimality but is inefficient. We also propose a deterministic policy gradient
method that is efficient but solves the robust policy evaluation problem only
approximately, and we prove that the approximation error scales with a new
measure of non-rectangularity of the uncertainty set. Finally, we describe an
actor-critic algorithm that finds an ϵ-optimal solution for the robust
policy improvement problem in 𝒪(1/ϵ^4) iterations. We thus
present the first complete solution scheme for robust MDPs with non-rectangular
uncertainty sets offering global optimality guarantees. Numerical experiments
show that our algorithms compare favorably against state-of-the-art methods.
更多查看译文
关键词
robust mdps,uncertainty,non-rectangular
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要