Chrome Extension
WeChat Mini Program
Use on ChatGLM

Mitigating Bias in Dataset Distillation

arxiv(2024)

Cited 0|Views14
No score
Abstract
Dataset Distillation has emerged as a technique for compressing large datasets into smaller synthetic counterparts, facilitating downstream training tasks. In this paper, we study the impact of bias inside the original dataset on the performance of dataset distillation. With a comprehensive empirical evaluation on canonical datasets with color, corruption and background biases, we found that color and background biases in the original dataset will be amplified through the distillation process, resulting in a notable decline in the performance of models trained on the distilled dataset, while corruption bias is suppressed through the distillation process. To reduce bias amplification in dataset distillation, we introduce a simple yet highly effective approach based on a sample reweighting scheme utilizing kernel density estimation. Empirical results on multiple real-world and synthetic datasets demonstrate the effectiveness of the proposed method. Notably, on CMNIST with 5 accuracy compared to 23.8 whereas applying state-of-the-art debiasing method on the same dataset only achieves 53.7 biases in dataset distillation and provide a promising avenue to address bias amplification in the process.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined