Regularizing policy iteration for recursive feasibility and stability.

CDC(2022)

引用 0|浏览5
暂无评分
摘要
We present a new algorithm called policy iteration plus (PI+) for the optimal control of nonlinear deterministic discrete-time plants with general cost functions. PI+ builds upon classical policy iteration and has the distinctive feature to enforce recursive feasibility under mild conditions, in the sense that the minimization problems solved at each iteration are guaranteed to admit a solution. While recursive feasibility is a desired property, it appears that existing results on the policy iteration algorithm fail to ensure it in general, contrary to PI+. We also establish the recursive stability of PI+: the policies generated at each iteration ensure a stability property for the closed-loop system. We prove our results under more general conditions than those currently available for policy iteration, by notably covering set stability. Finally, we present characterizations of near-optimality bounds for PI+ and prove the uniform convergence of the value functions generated by PI+ to the optimal value function. We believe that these results would benefit the burgeoning literature on approximate dynamic programming and reinforcement learning, where recursive feasibility is typically assumed without a clear method for verifying it and where recursive stability is essential for safe operation of the system.
更多
查看译文
关键词
dynamic programming,general cost functions,nonlinear deterministic discrete-time plants,optimal control,optimal value function,PI+,policy iteration plus,recursive feasibility,recursive stability,reinforcement learning,set stability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要