Improved Bound for Robust Causal Bandits with Linear Models

CoRR(2024)

引用 0|浏览0
暂无评分
摘要
This paper investigates the robustness of causal bandits (CBs) in the face of temporal model fluctuations. This setting deviates from the existing literature's widely-adopted assumption of constant causal models. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown and subject to variations over time. The goal is to design a sequence of interventions that incur the smallest cumulative regret compared to an oracle aware of the entire causal model and its fluctuations. A robust CB algorithm is proposed, and its cumulative regret is analyzed by establishing both upper and lower bounds on the regret. It is shown that in a graph with maximum in-degree d, length of the largest causal path L, and an aggregate model deviation C, the regret is upper bounded by 𝒪̃(d^L-1/2(√(T) + C)) and lower bounded by Ω(d^L/2-2max{√(T) , d^2C}). The proposed algorithm achieves nearly optimal 𝒪̃(√(T)) regret when C is o(√(T)), maintaining sub-linear regret for a broad range of C.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要