Variational Dynamic Programming for Stochastic Optimal Control
arxiv(2024)
摘要
We consider the problem of stochastic optimal control where the
state-feedback control policies take the form of a probability distribution,
and where a penalty on the entropy is added. By viewing the cost function as a
Kullback-Leibler (KL) divergence between two Markov chains, we bring the tools
from variational inference to bear on our optimal control problem. This allows
for deriving a dynamic programming principle, where the value function is
defined as a KL divergence again. We then resort to Gaussian distributions to
approximate the control policies, and apply the theory to control affine
nonlinear systems with quadratic costs. This results in closed-form recursive
updates, which generalize LQR control and the backward Riccati equation. We
illustrate this novel method on the simple problem of stabilizing an inverted
pendulum.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要