LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
arxiv(2024)
摘要
This study considers the linear contextual bandit problem with independent
and identically distributed (i.i.d.) contexts. In this problem, existing
studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets
satisfy O(log^2(T)) for the number of rounds T in a stochastic regime with
a suboptimality gap lower-bounded by a positive constant, while satisfying
O(√(T)) in an adversarial regime. However, the dependency on T has room
for improvement, and the suboptimality-gap assumption can be relaxed. For this
issue, this study proposes an algorithm whose regret satisfies O(log(T)) in
the setting when the suboptimality gap is lower-bounded. Furthermore, we
introduce a margin condition, a milder assumption on the suboptimality gap.
That condition characterizes the problem difficulty linked to the suboptimality
gap using a parameter β∈ (0, ∞]. We then show that the
algorithm's regret satisfies
O({log(T)}^1+β/2+βT^1/2+β).
Here, β= ∞ corresponds to the case in the existing studies where a
lower bound exists in the suboptimality gap, and our regret satisfies
O(log(T)) in that case. Our proposed algorithm is based on the
Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the
α-Linear-Contextual (LC)-Tsallis-INF.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要