Alleviating Exposure Bias for Neural Machine Translation via Contextual Augmentation and Self Distillation

IEEE ACM Trans. Audio Speech Lang. Process.(2023)

引用 0|浏览15
暂无评分
摘要
In neural machine translation (NMT), most sequence to-sequence (seq2seq) models are trained only with the teacher forcing paradigm, where the ground truth history is used to predict the next ground truth word. At the inference stage, however, the decoder predicts the next token solely based on history generated from scratch. Both using ground truth history and predicting ground truth words potentially lead to exposure bias. On the one hand, to alleviate the issue of exposure bias caused by using ground truth history, we propose contextual augmentation by allowing substitution, insertion, and deletion of words. The contextual augmentation applies to target sequence to generate non-ground truth and natural history when predicting next words. On the other hand, to alleviate the exposure bias caused by predicting ground truth words, we further apply self distillation to guide the model to carry out optimization according to smoothed prediction distribution, i.e, enable the model to predict not only ground truth words, but also other potentially correct and reasonable words. Experimental results on WMT14 English ? German and IWSLT14 German ? English translation tasks demonstrate that our approach achieves significant improvements over Transformer on standard benchmarks. Detailed experimental analyses further reveal the effectiveness of our proposed approach in improving translation quality.
更多
查看译文
关键词
History,Predictive models,Training,Decoding,Task analysis,Machine translation,Perturbation methods,Contextual augmentation,exposure bias,neural machine translation,self distillation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要