Monte Carlo Tree Search with Variable Simulation Periods for Continuously Running Tasks

2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)(2019)

引用 3|浏览48
暂无评分
摘要
Monte Carlo Tree Search (MCTS) is widely used for planning in domains where the potential actions can be represented as a tree of sequential decisions. To efficiently select an action, MCTS usually needs to perform many simulations to build a reliable tree representation of the decision space. As such, a bottleneck to MCTS arises when enough simulations cannot be performed between action selections. This is particularly highlighted in continuously running tasks, for which the time available to perform simulations between actions tends to be limited due to the environment's state constantly changing. In this paper, we present an approach that extends the time available for Monte Carlo simulations when allowed. Our approach is to effectively balance the prospect of selecting the right action with the time that can be spared to perform MCTS simulations before the next action selection. For that, we considered the simulation time as a decision variable to be selected alongside an action. We extended the Hierarchical Optimistic Optimization applied to Tree (HOOT) method to adapt our approach to environments with a continuous decision space. We evaluated our approach on tasks with a continuous decision space using OpenAI gym's Pendulum and Continuous Mountain Car environments and on those with discrete action space using the arcade learning environment (ALE) platform. The evaluation results show that, with variable simulation times, the proposed approach outperforms the conventional MCTS in the evaluated continuous decision space tasks and improves the performance of MCTS in most of the ALE tasks.
更多
查看译文
关键词
MCTS,HOOT,Variable simulation periods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要