Dual Knowledge Distillation for neural machine translation

Yuxian Wan,Wenlin Zhang,Zhen Li,Hao Zhang,Yanxia Li

COMPUTER SPEECH AND LANGUAGE（2024）

引用 1|浏览2

暂无评分

摘要

Existing knowledge distillation methods use large amount of bilingual data and focus on mining the corresponding knowledge distribution between the source language and the target language. However, for some languages, bilingual data is not abundant. In this paper, to make better use of both monolingual and limited bilingual data, we propose a new knowledge distillation method called Dual Knowledge Distillation (DKD). For monolingual data, we use a self-distillation strategy which combines self-training and knowledge distillation for the encoder to extract more consistent monolingual representation. For bilingual data, on top of the k Nearest Neighbor Knowledge Distillation (kNN-KD) method, a similar self-distillation strategy is adopted as a consistency regularization method to force the decoder to produce consistent output. Experiments on standard datasets, multi-domain translation datasets, and low-resource datasets show that DKD achieves consistent improvements over state-of-the-art baselines including kNN-KD.

查看译文

关键词

Knowledge distillation,k Nearest Neighbor Knowledge Distillation,Low-resource,Monolingual data

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要