Improving the Exploration in Upper Confidence Trees.

Adrien Couëtoux,Hassen Doghmen,Olivier Teytaud

LION（2012）

引用 5|浏览0

暂无评分

摘要

In the standard version of the UCT algorithm, in the case of a continuous set of decisions, the exploration of new decisions is done through blind search. This can lead to very inefficient exploration, particularly in the case of large dimension problems, which often happens in energy management problems, for instance. In an attempt to use the information gathered through past simulations to better explore new decisions, we propose a method named Blind Value BV. It only requires the access to a function that randomly draws feasible decisions. We also implement it and compare it to the original version of continuous UCT. Our results show that it gives a significant increase in convergence speed, in dimensions 12 and 80.

查看译文

关键词

new decision,UCT algorithm,continuous UCT,continuous set,inefficient exploration,original version,standard version,Blind Value,blind search,convergence speed,upper confidence tree

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要