Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization
CoRR(2024)
摘要
Adapting to a priori unknown noise level is a very important but challenging
problem in sequential decision-making as efficient exploration typically
requires knowledge of the noise level, which is often loosely specified. We
report significant progress in addressing this issue in linear bandits in two
respects. First, we propose a novel confidence set that is `semi-adaptive' to
the unknown sub-Gaussian parameter σ_*^2 in the sense that the
(normalized) confidence width scales with √(dσ_*^2 + σ_0^2)
where d is the dimension and σ_0^2 is the specified sub-Gaussian
parameter (known) that can be much larger than σ_*^2. This is a
significant improvement over √(dσ_0^2) of the standard confidence
set of Abbasi-Yadkori et al. (2011), especially when d is large. We show that
this leads to an improved regret bound in linear bandits. Second, for bounded
rewards, we propose a novel variance-adaptive confidence set that has a much
improved numerical performance upon prior art. We then apply this confidence
set to develop, as we claim, the first practical variance-adaptive linear
bandit algorithm via an optimistic approach, which is enabled by our novel
regret analysis technique. Both of our confidence sets rely critically on
`regret equality' from online learning. Our empirical evaluation in Bayesian
optimization tasks shows that our algorithms demonstrate better or comparable
performance compared to existing methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要