Rotting Infinitely Many-armed Bandits beyond the Worst-case Rotting: An Adaptive Approach
arxiv(2024)
摘要
In this study, we consider the infinitely many armed bandit problems in
rotting environments, where the mean reward of an arm may decrease with each
pull, while otherwise, it remains unchanged. We explore two scenarios capturing
problem-dependent characteristics regarding the decay of rewards: one in which
the cumulative amount of rotting is bounded by V_T, referred to as the
slow-rotting scenario, and the other in which the number of rotting instances
is bounded by S_T, referred to as the abrupt-rotting scenario. To address the
challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB
with an adaptive sliding window, designed to manage the bias and variance
trade-off arising due to rotting rewards. Our proposed algorithm achieves tight
regret bounds for both slow and abrupt rotting scenarios. Lastly, we
demonstrate the performance of our algorithms using synthetic datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要