The Update-Equivalence Framework for Decision-Time Planning
arxiv(2023)
摘要
The process of revising (or constructing) a policy at execution time – known
as decision-time planning – has been key to achieving superhuman performance
in perfect-information games like chess and Go. A recent line of work has
extended decision-time planning to imperfect-information games, leading to
superhuman performance in poker. However, these methods involve solving
subgames whose sizes grow quickly in the amount of non-public information,
making them unhelpful when the amount of non-public information is large.
Motivated by this issue, we introduce an alternative framework for
decision-time planning that is not based on solving subgames, but rather on
update equivalence. In this update-equivalence framework, decision-time
planning algorithms replicate the updates of last-iterate algorithms, which
need not rely on public information. This facilitates scalability to games with
large amounts of non-public information. Using this framework, we derive a
provably sound search algorithm for fully cooperative games based on mirror
descent and a search algorithm for adversarial games based on magnetic mirror
descent. We validate the performance of these algorithms in cooperative and
adversarial domains, notably in Hanabi, the standard benchmark for search in
fully cooperative imperfect-information games. Here, our mirror descent
approach exceeds or matches the performance of public information-based search
while using two orders of magnitude less search time. This is the first
instance of a non-public-information-based algorithm outperforming
public-information-based approaches in a domain they have historically
dominated.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要