Not All Data Matters: An End-to-End Adaptive Dataset Pruning Framework for Enhancing Model Performance and Efficiency
CoRR(2023)
摘要
While deep neural networks have demonstrated remarkable performance across
various tasks, they typically require massive training data. Due to the
presence of redundancies and biases in real-world datasets, not all data in the
training dataset contributes to the model performance. To address this issue,
dataset pruning techniques have been introduced to enhance model performance
and efficiency by eliminating redundant training samples and reducing
computational and memory overhead. However, previous works most rely on
manually crafted scalar scores, limiting their practical performance and
scalability across diverse deep networks and datasets. In this paper, we
propose AdaPruner, an end-to-end Adaptive DAtaset PRUNing framEwoRk. AdaPruner
can perform effective dataset pruning without the need for explicitly defined
metrics. Our framework jointly prunes training data and fine-tunes models with
task-specific optimization objectives. AdaPruner leverages (1) An adaptive
dataset pruning (ADP) module, which iteratively prunes redundant samples to an
expected pruning ratio; and (2) A pruning performance controller (PPC) module,
which optimizes the model performance for accurate pruning. Therefore,
AdaPruner exhibits high scalability and compatibility across various datasets
and deep networks, yielding improved dataset distribution and enhanced model
performance. AdaPruner can still significantly enhance model performance even
after pruning up to 10-30\% of the training data. Notably, these improvements
are accompanied by substantial savings in memory and computation costs.
Qualitative and quantitative experiments suggest that AdaPruner outperforms
other state-of-the-art dataset pruning methods by a large margin.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要