Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandit

arxiv(2021)

引用 0|浏览0
暂无评分
摘要
We consider the problem of finding, through adaptive sampling, which of n populations (arms) has the largest mean. Our objective is to determine a rule which identifies the best population with a fixed minimum confidence using as few observations as possible, i.e. fixed-confidence (FC) best arm identification (BAI) in multi-armed bandits. We study such problems under the Bayesian setting with both Bernoulli and Gaussian populations. We propose to use the classical vector at a time (VT) rule, which samples each alive population once in each round. We show how VT can be implemented and analyzed in our Bayesian setting and be improved by early elimination. We also propose and analyze a variant of the classical play the winner (PW) algorithm. Numerical results show that these rules compare favorably with state-of-art algorithms.
更多
查看译文
关键词
identification,fixed-confidence,multi-armed
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要