Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization
IEEE Control Systems Letters(2024)
Abstract
This work focuses on the entropy-regularized independent natural policy
gradient (NPG) algorithm in multi-agent reinforcement learning. In this work,
agents are assumed to have access to an oracle with exact policy evaluation and
seek to maximize their respective independent rewards. Each individual's reward
is assumed to depend on the actions of all the agents in the multi-agent
system, leading to a game between agents. We assume all agents make decisions
under a policy with bounded rationality, which is enforced by the introduction
of entropy regularization. In practice, a smaller regularization implies the
agents are more rational and behave closer to Nash policies. On the other hand,
agents with larger regularization acts more randomly, which ensures more
exploration. We show that, under sufficient entropy regularization, the
dynamics of this system converge at a linear rate to the quantal response
equilibrium (QRE). Although regularization assumptions prevent the QRE from
approximating a Nash equilibrium, our findings apply to a wide range of games,
including cooperative, potential, and two-player matrix games. We also provide
extensive empirical results on multiple games (including Markov games) as a
verification of our theoretical analysis.
MoreTranslated text
Key words
Game Theory,Multi-Agent Reinforcement Learning,Natural Policy Gradient,Quantal Response Equilibrium
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined