Neural Networks Learn Statistics of Increasing Complexity
arxiv(2024)
摘要
The distributional simplicity bias (DSB) posits that neural networks learn
low-order moments of the data distribution first, before moving on to
higher-order correlations. In this work, we present compelling new evidence for
the DSB by showing that networks automatically learn to perform well on
maximum-entropy distributions whose low-order statistics match those of the
training set early in training, then lose this ability later. We also extend
the DSB to discrete domains by proving an equivalence between token n-gram
frequencies and the moments of embedding vectors, and by finding empirical
evidence for the bias in LLMs. Finally we use optimal transport methods to
surgically edit the low-order statistics of one class to match those of
another, and show that early-training networks treat the edited samples as if
they were drawn from the target class. Code is available at
https://github.com/EleutherAI/features-across-time.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要