Lifelong language learning with adaptive uncertainty regularization

Information Sciences(2023)

引用 1|浏览29
暂无评分
摘要
It has been a long-standing goal in natural language processing (NLP) to learn a general lin-guistic intelligence model that can perform well on many different NLP tasks continually evolving over time while avoiding revisiting all previous data at each stage. Most existing deep neural networks suffer from catastrophic forgetting when dealing with sequential tasks in an incremental way, leading to dramatic performance degradation due to the miss-ing training data of old tasks. In this paper, we propose a Lifelong language method with Adaptive Uncertainty Regularization (LAUR), which can adapt a single BERT model to work with continuously arriving text examples from different NLP tasks. Specifically, LAUR is built on the Bayesian online learning framework, and three uncertainty regularization terms are devised to collaboratively control the parameters so as to resolve the stability -plasticity dilemma in lifelong language learning. The previous posterior constrain param-eters that strongly determine the output results, preventing these parameters from chang-ing drastically, while other parameters are encouraged to be updated over time. In addition, we propose a task-specific residual adaptation module in parallel to each layer of BERT to endow LAUR with the capacity to learn better task-specific knowledge. This con-figuration makes LAUR less prone to losing the knowledge stored in the base BERT network when learning a new task. Experimental results show that LAUR outperforms state-of-the-art lifelong learning models on a variety of NLP tasks. For reproducibility, we submit the code and data at:https://github.com/kiujhytgtrfd2021/LAUR.(c) 2022 Elsevier Inc. All rights reserved.
更多
查看译文
关键词
Lifelong language learning,Adaptive uncertainty regularization,Residual parallel adaptation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要