The Role of a Reward in Shaping Multiple Football Agents' Behavior: An Empirical Study

APPLIED SCIENCES-BASEL(2023)

引用 0|浏览7
暂无评分
摘要
In reinforcement learning (RL), a reward formed with a scalar value is seen as a sufficient means to guide an agent's behavior. A reward drives an agent to seek out an optimal policy to solve a problem (or to achieve a goal) under uncertainty. In this paper, we aimed to probe the benefit of such a scalar reward in the shaping of coordination policy using artificial football scenarios. In a football setting, a team normally practices two types of strategies: one is a primary formation, that is, the default strategy of a team regardless of their opponents (e.g., 4-4-2, 4-3-3), and the other is an adaptive strategy, that is, a reactive tactic responding to the spontaneous changes of their opponents. We focused here on the primary formation as a team coordination policy that can be trained by a reward using multi-agent RL (MARL) algorithms. Once a team of multiple football agents has successfully learned a primary formation based on a reward-driven approach, we assumed that the team is able to exhibit the primary formation when facing various opponent teams they have never faced in due course to receive a reward. To precisely examine this behavior, we conducted a large number of simulations with twelve artificial football teams in an AI world cup environment. Here, we trained two MARL-based football teams with a team guided by a random walk formation. Afterwards, we performed the artificial football matches with the most competitive of the twelve teams that the MARL-based teams had never played against. Given the analyses of the performance of each football team with regard to their average score and competitiveness, the results showed that the proposed MARL teams outperformed the others with respect to competitiveness, although these teams were not the best with respect to the average score. This indicated that the coordination policy of the MARL-based football teams was moderately consistent against both known and unknown opponents due to the successful learning of a primary formation following the guidance of a scalar reward.
更多
查看译文
关键词
shaping multiple football agents,reward,behavior
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要