Explicitly Maintaining Diverse Playing Styles in Self-Play

ICLR 2023(2023)

引用 0|浏览49
暂无评分
摘要
Self-play has proven to be an effective training schema to obtain a high-level agent in complex games through iteratively playing against an opponent from its historical versions. However, its training process may prevent it from generating a well-generalised policy since the trained agent rarely encounters diversely-behaving opponents along its own historical path. In this paper, we aim to improve the generalisation of the policy by maintaining a population of agents with diverse playing styles and high skill levels throughout the training process. Specifically, we propose a bi-objective optimisation model to simultaneously optimise the agents' skill level and playing style. A feature of this model is that we do not regard the skill level and playing style as two objectives to maximise directly since they are not equally important (i.e., agents with diverse playing styles but low skill levels are meaningless). Instead, we create a meta bi-objective model to enable high-level agents with diverse playing styles more likely to be incomparable (i.e. Pareto non-dominated), thereby playing against each other through the training process. We then present an evolutionary algorithm working with the proposed model. Experiments in a classic table tennis game Pong and a commercial role-playing game Justice Online show that our algorithm can learn a well generalised policy and at the same time is able to provide a set of high-level policies with various playing styles.
更多
查看译文
关键词
Reinforcement learning,evolutionary algorithm,self-play,diverse playing styles,high skill levels
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要