Contextual Multinomial Logit Bandits with General Value Functions
CoRR(2024)
摘要
Contextual multinomial logit (MNL) bandits capture many real-world assortment
recommendation problems such as online retailing/advertising. However, prior
work has only considered (generalized) linear value functions, which greatly
limits its applicability. Motivated by this fact, in this work, we consider
contextual MNL bandits with a general value function class that contains the
ground truth, borrowing ideas from a recent trend of studies on contextual
bandits. Specifically, we consider both the stochastic and the adversarial
settings, and propose a suite of algorithms, each with different
computation-regret trade-off. When applied to the linear case, our results not
only are the first ones with no dependence on a certain problem-dependent
constant that can be exponentially large, but also enjoy other advantages such
as computational efficiency, dimension-free regret bounds, or the ability to
handle completely adversarial contexts and rewards.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要