Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training

biorxiv(2022)

引用 5|浏览21
暂无评分
摘要
Artificial neural networks have emerged as computationally plausible models of human language processing. A major criticism of these models is that the amount of training data they receive far exceeds that of humans during language learning. Here, we use two complementary approaches to ask how the models’ ability to capture human neural and behavioral responses to language is affected by the amount of training data. First, we evaluate GPT-2 models trained on 1 million, 10 million, 100 million, or 1 billion tokens against two fMRI benchmarks and one behavioral (reading times) benchmark. Because children are exposed to approximately 100 million words during the first 10 years of life, we consider the 100-million-token model developmentally plausible. Second, we test the performance of a GPT-2 model that is trained on a 9-billion dataset to reach state-of-the-art next-word prediction performance against the same human benchmarks at different stages during training. Across both approaches, we find that (i) the models trained on a developmentally plausible amount of data already achieve near-maximal performance in capturing neural and behavioral responses to language. Further, (ii) lower perplexity—a measure of next-word prediction performance—is associated with stronger alignment with the human benchmarks, suggesting that models that have received enough training to achieve sufficiently high next-word prediction performance also acquire human-like representations of the linguistic input. In tandem, these findings establish that although some training is necessary for the models’ ability to predict human responses to language, a developmentally realistic amount of training (∼100 million tokens) may suffice. Summary Are artificial neural network (ANN) language models useful as models of human language processing? Some of these models have been shown to capture human responses to language with relatively high accuracy. However, these models are trained on vastly more data than what children are exposed to during language acquisition, raising questions about their value for understanding the human language system. Here, we systematically manipulate the amount of training data that ANN models receive and show that models that are trained on developmentally plausible amounts of language data (approximately 100 million words, roughly corresponding to a child’s first 10 years of life) achieve near-maximal performance on human neural and behavioral benchmarks. These developmentally plausible models—rather than models that achieve state-of-the-art performance on the next-word prediction task—hold substantial promise in providing mechanistic-level insights into human language processing. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要