Improved Regret for Bandit Convex Optimization with Delayed Feedback
CoRR(2024)
摘要
We investigate bandit convex optimization (BCO) with delayed feedback, where
only the loss value of the action is revealed under an arbitrary delay.
Previous studies have established a regret bound of O(T^3/4+d^1/3T^2/3)
for this problem, where d is the maximum delay, by simply feeding delayed
loss values to the classical bandit gradient descent (BGD) algorithm. In this
paper, we develop a novel algorithm to enhance the regret, which carefully
exploits the delayed bandit feedback via a blocking update mechanism. Our
analysis first reveals that the proposed algorithm can decouple the joint
effect of the delays and bandit feedback on the regret, and improve the regret
bound to O(T^3/4+√(dT)) for convex functions. Compared with the
previous result, our regret matches the O(T^3/4) regret of BGD in the
non-delayed setting for a larger amount of delay, i.e., d=O(√(T)),
instead of d=O(T^1/4). Furthermore, we consider the case with strongly
convex functions, and prove that the proposed algorithm can enjoy a better
regret bound of O(T^2/3log^1/3T+dlog T). Finally, we show that in a
special case with unconstrained action sets, it can be simply extended to
achieve a regret bound of O(√(Tlog T)+dlog T) for strongly convex and
smooth functions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要