Chrome Extension
WeChat Mini Program
Use on ChatGLM

A Lyapunov characterization of robust policy optimization

Control Theory and Technology(2023)

Cited 1|Views0
No score
Abstract
In this paper, we study the robustness property of policy optimization (particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning) subject to noise at each iteration. By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method, it is shown that, if the noise is sufficiently small, the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration. Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided. Based on Willems’ fundamental lemma, a learning-based policy iteration algorithm is proposed. The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal. The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration. Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.
More
Translated text
Key words
Policy optimization,Policy iteration (PI),Input-to-state stability (ISS),Lyapunov’s direct method
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined