Adaptivity to Smoothness in X-armed bandits.
COLT(2018)
摘要
We study the stochastic continuum-armed bandit problem from the angle of adaptivity to unknown regularity of the reward function f. We prove that there exists no strategy for the cumulative regret that adapts optimally to the smoothness of f. We show however that such minimax optimal adaptive strategies exist if the learner is given extra-information about f. Finally, we complement our positive results with matching lower bounds.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络