Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning.
CoRR(2023)
摘要
The improvement in the performance of efficient and lightweight models (i.e.,
the student model) is achieved through knowledge distillation (KD), which
involves transferring knowledge from more complex models (i.e., the teacher
model). However, most existing KD techniques rely on Kullback-Leibler (KL)
divergence, which has certain limitations. First, if the teacher distribution
has high entropy, the KL divergence's mode-averaging nature hinders the
transfer of sufficient target information. Second, when the teacher
distribution has low entropy, the KL divergence tends to excessively focus on
specific modes, which fails to convey an abundant amount of valuable knowledge
to the student. Consequently, when dealing with datasets that contain numerous
confounding or challenging samples, student models may struggle to acquire
sufficient knowledge, resulting in subpar performance. Furthermore, in previous
KD approaches, we observed that data augmentation, a technique aimed at
enhancing a model's generalization, can have an adverse impact. Therefore, we
propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages
correlation distance and network pruning. This approach enables KD to
effectively incorporate data augmentation for performance improvement.
Extensive experiments on various datasets, including CIFAR-100, FGVR,
TinyImagenet, and ImageNet, demonstrate our method's superiority over current
state-of-the-art methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要