A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health
CoRR(2024)
摘要
Restless multi-armed bandits (RMABs) are used to model sequential resource
allocation in public health intervention programs. In these settings, the
underlying transition dynamics are often unknown a priori, requiring online
reinforcement learning (RL). However, existing methods in online RL for RMABs
cannot incorporate properties often present in real-world public health
applications, such as contextual information and non-stationarity. We present
Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs
that novelly combines techniques in Bayesian modeling with Thompson sampling to
flexibly model a wide range of complex RMAB settings, such as contextual and
non-stationary RMABs. A key contribution of our approach is its ability to
leverage shared information within and between arms to learn unknown RMAB
transition dynamics quickly in budget-constrained settings with relatively
short time horizons. Empirically, we show that BCoR achieves substantially
higher finite-sample performance than existing approaches over a range of
experimental settings, including one constructed from a real-world public
health campaign in India.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要