Prune Then Distill: Dataset Distillation with Importance Sampling

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览1
暂无评分
摘要
The development of large datasets for various tasks has driven the success of deep learning models but at the cost of increased label noise, duplication, collection challenges, storage capabilities, and training requirements. In this work, we investigate whether all samples in large datasets contribute equally to better model accuracy. We study statistical and mathematical techniques to reduce redundancies in datasets by directly optimizing data samples for the generalization accuracy of deep learning models. Existing dataset optimization approaches include analytic methods that remove unimportant samples and synthetic methods that generate new datasets to maximize the generalization accuracy. We develop Prune then distill, a combination of analytic and synthetic dataset optimization algorithms, and demonstrate up to 15% relative improvement in generalization accuracy over either approach used independently on standard image and audio classification tasks. Additionally, we demonstrate up to 38% improvement in generalization accuracy of dataset pruning algorithms by maintaining class balance while pruning.
更多
查看译文
关键词
Dataset optimization,Dataset pruning,Dataset distillation,Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要