A modified Thompson sampling-based learning algorithm for unknown linear systems.

Mukul Gagrani,Sagar Sudhakara,Aditya Mahajan,Ashutosh Nayyar,Yi Ouyang

CDC（2022）

引用 1|浏览5

暂无评分

摘要

We revisit the Thompson sampling-based learning algorithm for controlling an unknown linear system with quadratic cost proposed in [1]. This algorithm operates in episodes of dynamic length and it is shown to have a regret bound of (O) over tilde(root T) , where T is the time-horizon. The regret bound of this algorithm is obtained under a technical assumption on the induced norm of the closed loop system. We propose a variation of this algorithm that enforces a lower bound T-min on the episode length. We show that a careful choice of T-min (that depends on the uncertainty about the system model) allows us to recover the (O) over tilde(root T) regret bound under a milder technical condition about the closed loop system.

查看译文

关键词

learning,linear systems,algorithm,sampling-based

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要