Mitigating Catastrophic Forgetting in Neural Machine Translation Through Teacher-Student Distillation with Attention Mechanism

Quynh-Trang Pham Thi, Ngoc-Huyen Ngo, Anh-Duc Nguyen, Dac‐Nhuong Le, Tri-Thanh Nguyen,Quang-Thuy Ha

Communications in computer and information science(2023)

引用 0|浏览2
暂无评分
摘要
The catastrophic forgetting is a critical problem for deep learning models, where the models learning a sequence of tasks forgets the previously learned knowledge during being trained on new data of the new task. The main reason is that a new task may likely override the weights that have been learned in the past. In this research, we propose a novel approach to address this issue for the neural machine translation model based on improving the COKD model proposed by S. Shao and Y. Feng (2022). The main idea is to divide the training data into $$n+1$$ parts, train n teacher models into the first n parts, and let the student model learn from the remaining part. We propose ModifiedCOKD, a method to initialize the effective teacher model parameters and use an attention mechanism to distil knowledge from the teacher models to the student models. Experimental results on the task of English-to-Vietnamese translation demonstrate that ModifiedCOKD outperforms the baseline method in mitigating catastrophic forgetting.
更多
查看译文
关键词
catastrophic forgetting,neural machine translation,attention,teacher-student
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要