Decentralized Adaptive TD $(\lambda)$ Learning With Linear Function Approximation: Nonasymptotic Analysis

IEEE Transactions on Systems, Man, and Cybernetics: Systems(2024)

Cited 0|Views12
No score
Abstract
In multiagent reinforcement learning, policy evaluation is a central problem. To solve this problem, decentralized temporal-difference (TD) learning is one of the most popular methods, which has been investigated in recent years. However, existing decentralized variants of TD learning often suffer from slow convergence due to the sensitive selection of learning rates. Inspired by the great success of adaptive gradient methods in the training of deep neural networks, this article proposes a decentralized adaptive TD $(\lambda)$ learning algorithm for general $\lambda$ with linear function approximation, referred to as D-AMSTD $(\bm{\lambda})$ , which can mitigate the selective sensitivity of learning rates. Furthermore, we establish the finite-time performance bounds of D-AMSTD $(\bm{\lambda})$ under the Markovian observation model. The theoretical results show that D-AMSTD $(\bm{\lambda})$ can linearly converge to an arbitrarily small size of neighborhood of the optimal weight. Finally, we verify the efficacy of D-AMSTD $(\bm{\lambda})$ through a variety of experiments. The results show that D-AMSTD $(\bm{\lambda})$ outperforms existing decentralized TD learning methods.
More
Translated text
Key words
Finite-time bounds,multiagent reinforcement learning (MARL),policy evaluation,temporal-difference (TD) learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined