A modified Thompson sampling-based learning algorithm for unknown linear systems.

CDC(2022)

引用 1|浏览5
暂无评分
摘要
We revisit the Thompson sampling-based learning algorithm for controlling an unknown linear system with quadratic cost proposed in [1]. This algorithm operates in episodes of dynamic length and it is shown to have a regret bound of (O) over tilde(root T) , where T is the time-horizon. The regret bound of this algorithm is obtained under a technical assumption on the induced norm of the closed loop system. We propose a variation of this algorithm that enforces a lower bound T-min on the episode length. We show that a careful choice of T-min (that depends on the uncertainty about the system model) allows us to recover the (O) over tilde(root T) regret bound under a milder technical condition about the closed loop system.
更多
查看译文
关键词
learning,linear systems,algorithm,sampling-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要