Demystifying the Recency Heuristic in Temporal-Difference Learning
arxiv(2024)
Abstract
The recency heuristic in reinforcement learning is the assumption that
stimuli that occurred closer in time to an acquired reward should be more
heavily reinforced. The recency heuristic is one of the key assumptions made by
TD(λ), which reinforces recent experiences according to an
exponentially decaying weighting. In fact, all other widely used return
estimators for TD learning, such as n-step returns, satisfy a weaker (i.e.,
non-monotonic) recency heuristic. Why is the recency heuristic effective for
temporal credit assignment? What happens when credit is assigned in a way that
violates this heuristic? In this paper, we analyze the specific mathematical
implications of adopting the recency heuristic in TD learning. We prove that
any return estimator satisfying this heuristic: 1) is guaranteed to converge to
the correct value function, 2) has a relatively fast contraction rate, and 3)
has a long window of effective credit assignment, yet bounded worst-case
variance. We also give a counterexample where on-policy, tabular TD methods
violating the recency heuristic diverge. Our results offer some of the first
theoretical evidence that credit assignment based on the recency heuristic
facilitates learning.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined