谷歌浏览器插件
订阅小程序
在清言上使用

Adaptive Optimal Control of Nonlinear Systems with Multiple Time-scale Eligibility Traces

2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC(2023)

引用 0|浏览0
暂无评分
摘要
Adaptive dynamic programming (ADP) is one of the main methods to solve the optimal control problem of nonlinear systems. Eligibility traces are utilized in recent years to reduce the computing burden of the value function, but the existing fixed eligibility trace is difficult to ensure stable convergence especially when facing environmental changes and complex neural network structures. To solve the above issues, a novel off-policy algorithm, T-HDP(lambda) with Multiple Timescale Eligibility Traces (MET), is proposed. By utilizing MET, the new algorithm can adaptively accumulate gradients and include more gradient information, which guides the control faster in the optimal direction. T-step Truncated lambda-returns are utilized to solve the infinite-horizon optimal control problems, and a new importance sampling ratio is designed to correct the value function. Furthermore, the convergence and boundedness of the algorithm are proved. Based on the actor-critic network architecture, the optimal value function and policy are well approximated. Finally, compared with the original algorithm by a simulation example, the proposed algorithm has a faster convergence speed and lower variance.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要