Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic
arxiv(2021)
摘要
Actor-critic (AC) algorithms, empowered by neural networks, have had
significant empirical success in recent years. However, most of the existing
theoretical support for AC algorithms focuses on the case of linear function
approximations, or linearized neural networks, where the feature representation
is fixed throughout training. Such a limitation fails to capture the key aspect
of representation learning in neural AC, which is pivotal in practical
problems. In this work, we take a mean-field perspective on the evolution and
convergence of feature-based neural AC. Specifically, we consider a version of
AC where the actor and critic are represented by overparameterized two-layer
neural networks and are updated with two-timescale learning rates. The critic
is updated by temporal-difference (TD) learning with a larger stepsize while
the actor is updated via proximal policy optimization (PPO) with a smaller
stepsize. In the continuous-time and infinite-width limiting regime, when the
timescales are properly separated, we prove that neural AC finds the globally
optimal policy at a sublinear rate. Additionally, we prove that the feature
representation induced by the critic network is allowed to evolve within a
neighborhood of the initial one.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要