Index policy for multiarmed bandit problem with dynamic risk measures

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH(2024)

引用 0|浏览1
暂无评分
摘要
The multiarmed bandit problem (MAB) is a classic problem in which a finite amount of resources must be allocated among competing choices with the aim of identifying a policy that maximizes the expected total reward. MAB has a wide range of applications including clinical trials, portfolio design, tuning parameters, internet advertisement, auction mechanisms, adaptive routing in networks, and project management. The classical MAB makes the strong assumption that the decision maker is risk-neutral and indifferent to the variability of the outcome. However, in many real life applications, these assumptions are not met and decision makers are risk-averse. Motivated to resolve this, we study risk-averse control of the multiarmed bandit problem in regard to the concept of dynamic coherent risk measures to determine a policy with the best risk-adjusted total discounted return. In respect of this specific setting, we present a theoretical analysis based on Whittle's retirement problem and propose a priority-index policy that reduces to the Gittins index when the level of risk-aversion converges to zero. We generalize the restart formulation of the Gittins index to effectively compute these risk-averse allocation indices. Nu-merical results exhibit the excellent performance of this heuristic approach for two well-known coherent risk measures of first-order mean-semideviation and mean-AVaR. Our experimental studies suggest that there is no guarantee that an index-based optimal policy exists for the risk-averse problem. Nonetheless, our risk-averse allocation indices can achieve optimal or near-optimal policies which in some instances are easier to interpret compared to the exact optimal policy.& COPY; 2023 Elsevier B.V. All rights reserved.
更多
查看译文
关键词
Stochastic programming,Multiarmed bandit problem,Gittins index,Dynamic coherent risk measures,Risk-averse control
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要