Enhanced Sparsification via Stimulative Training
arxiv(2024)
摘要
Sparsification-based pruning has been an important category in model
compression. Existing methods commonly set sparsity-inducing penalty terms to
suppress the importance of dropped weights, which is regarded as the suppressed
sparsification paradigm. However, this paradigm inactivates the dropped parts
of networks causing capacity damage before pruning, thereby leading to
performance degradation. To alleviate this issue, we first study and reveal the
relative sparsity effect in emerging stimulative training and then propose a
structured pruning framework, named STP, based on an enhanced sparsification
paradigm which maintains the magnitude of dropped weights and enhances the
expressivity of kept weights by self-distillation. Besides, to find an optimal
architecture for the pruned network, we propose a multi-dimension architecture
space and a knowledge distillation-guided exploration strategy. To reduce the
huge capacity gap of distillation, we propose a subnet mutating expansion
technique. Extensive experiments on various benchmarks indicate the
effectiveness of STP. Specifically, without fine-tuning, our method
consistently achieves superior performance at different budgets, especially
under extremely aggressive pruning scenarios, e.g., remaining 95.11
accuracy (72.43
Codes will be released soon.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要