On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
CVPR 2024(2023)
摘要
Contemporary machine learning requires training large neural networks on
massive datasets and thus faces the challenges of high computational demands.
Dataset distillation, as a recent emerging strategy, aims to compress
real-world datasets for efficient training. However, this line of research
currently struggle with large-scale and high-resolution datasets, hindering its
practicality and feasibility. To this end, we re-examine the existing dataset
distillation methods and identify three properties required for large-scale
real-world applications, namely, realism, diversity, and efficiency. As a
remedy, we propose RDED, a novel computationally-efficient yet effective data
distillation paradigm, to enable both diversity and realism of the distilled
data. Extensive empirical results over various neural architectures and
datasets demonstrate the advancement of RDED: we can distill the full
ImageNet-1K to a small dataset comprising 10 images per class within 7 minutes,
achieving a notable 42
(while the SOTA only achieves 21
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要