Strategic Choices: Small Budgets and Simple Regret

Cheng-Wei Chou,Ping-Chiang Chou,Chang-Shing Lee,David Lupien Saint-Pierre,Olivier Teytaud,Mei-Hui Wang,Li-Wen Wu,Shi-Jim Yen

Technologies and Applications of Artificial Intelligence（2012）

引用 9|浏览1

暂无评分

摘要

In many decision problems, there are two levels of choice: The first one is strategic and the second is tactical. We formalize the difference between both and discuss the relevance of the bandit literature for strategic decisions and test the quality of different bandit algorithms in real world examples such as board games and card games. For exploration-exploitation algorithm, we evaluate the Upper Confidence Bounds and Exponential Weights, as well as algorithms designed for simple regret, such as Successive Reject. For the exploitation, we also evaluate Bernstein Races and Uniform Sampling. As for the recommandation part, we test Empirically Best Arm, Most Played, Lower ConfidenceBounds and Empirical Distribution. In the one-player case, we recommend Upper Confidence Bound as an exploration algorithm (and in particular its variants adaptUCBE for parameter-free simple regret) and Lower Confidence Bound or Most Played Arm as recommendation algorithms. In the two-player case, we point out the commodity and efficiency of the EXP3 algorithm, and the very clear improvement provided by the truncation algorithm TEXP3. Incidentally our algorithm won some games against professional players in kill-all Go (to the best of our knowledge, for the first time in computer games).

查看译文

关键词

optimisation,strategic decisions,simple regret,exponential weights,recommendation policy,upper confidence bounds,card games,empirically best arm,exploration algorithm,decision making,strategic choices,truncation algorithm texp3,recommendation algorithm,exploration policy,uniform sampling,bernstein races,bandit problems,lower confidence bound,texp3,board games,game theory,different bandit algorithm,exp3 algorithm,empirical distribution,truncation algorithm,upper confidence bound,exploration-exploitation algorithm,most played,small budgets,bandit literature,best arm,lower confidence bounds,decision problems

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要