JP-DouZero: an enhanced DouDiZhu AI based on reinforcement learning with peasant collaboration and intrinsic rewards.

International Conference on Big Data Computing and Communications(2023)

引用 0|浏览0
暂无评分
摘要
DouDiZhu is a popular Chinese three-player poker game where two peasants collaborate against a landlord. The high complexity of the problem for a reinforcement learning-based AI is attributed to three factors: imperfect information, coexistence of competition and cooperation, as well as huge state and action spaces. The current state-of-the-art system is called DouZero, which combines Monte Carlo methods with deep neural networks and makes use of self-play without human expertise. This paper proposes JP-DouZero which addresses two shortcomings of existing methods, namely: a) the cooperation between the two peasants is not explicitly modeled and b) ’sparse reward’ i.e., state-action trajectories receive a binary score based on whether they lead to win or loss at the end of the game. For the former, we design a joint peasant Q-network to determine the reward of every state-action pair from the standpoint of the peasant coalition. For the latter, we devise a new reward mechanism comprising of three parts, namely curiosity-driven reward, result-driven reward, and extrinsic reward. Extensive experiments corroborate a significant increase of the peasant advantage in terms of a 2.2% higher winning rate and over 0.12 higher difference of scored points compared with DouZero baseline. An ablation study is carried out to show the impact of the design choices on the overall performance improvement.
更多
查看译文
关键词
joint-peasant,sparse reward,Q-network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要