The Randomness of Input Data Spaces is an A Priori Predictor for Generalization

Martin Briesch,Dominik Sobania,Franz Rothlauf

KI 2022: Advances in Artificial Intelligence（2022）

引用 0|浏览0

暂无评分

摘要

Over-parameterized models can perfectly learn various types of data distributions, however, generalization error is usually lower for real data in comparison to artificial data. This suggests that the properties of data distributions have an impact on generalization capability. This work focuses on the search space defined by the input data and assumes that the correlation between labels of neighboring input values influences generalization. If correlation is low, the randomness of the input data space is high leading to high generalization error. We suggest to measure the randomness of an input data space using Maurer’s universal. Results for synthetic classification tasks and common image classification benchmarks (MNIST, CIFAR10, and Microsoft’s cats vs. dogs data set) find a high correlation between the randomness of input data spaces and the generalization error of deep neural networks for binary classification problems.

查看译文

关键词

Deep learning, Label landscape, Generalization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要