Order parameters and phase transitions of continual learning in deep neural networks
arxiv(2024)
Abstract
Continual learning (CL) enables animals to learn new tasks without erasing
prior knowledge. CL in artificial neural networks (NNs) is challenging due to
catastrophic forgetting, where new learning degrades performance on older
tasks. While various techniques exist to mitigate forgetting, theoretical
insights into when and why CL fails in NNs are lacking. Here, we present a
statistical-mechanics theory of CL in deep, wide NNs, which characterizes the
network's input-output mapping as it learns a sequence of tasks. It gives rise
to order parameters (OPs) that capture how task relations and network
architecture influence forgetting and knowledge transfer, as verified by
numerical evaluations. We found that the input and rule similarity between
tasks have different effects on CL performance. In addition, the theory
predicts that increasing the network depth can effectively reduce overlap
between tasks, thereby lowering forgetting. For networks with task-specific
readouts, the theory identifies a phase transition where CL performance shifts
dramatically as tasks become less similar, as measured by the OPs. Sufficiently
low similarity leads to catastrophic anterograde interference, where the
network retains old tasks perfectly but completely fails to generalize new
learning. Our results delineate important factors affecting CL performance and
suggest strategies for mitigating forgetting.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined