Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandit

MohammadJavad Azizi,Sheldon M Ross, Zhengyu Zhang

arxiv（2021）

引用 0|浏览0

暂无评分

摘要

We consider the problem of finding, through adaptive sampling, which of n populations (arms) has the largest mean. Our objective is to determine a rule which identifies the best population with a fixed minimum confidence using as few observations as possible, i.e. fixed-confidence (FC) best arm identification (BAI) in multi-armed bandits. We study such problems under the Bayesian setting with both Bernoulli and Gaussian populations. We propose to use the classical vector at a time (VT) rule, which samples each alive population once in each round. We show how VT can be implemented and analyzed in our Bayesian setting and be improved by early elimination. We also propose and analyze a variant of the classical play the winner (PW) algorithm. Numerical results show that these rules compare favorably with state-of-art algorithms.

查看译文

关键词

identification,fixed-confidence,multi-armed

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要