谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Testing the Relationship between Word Length, Frequency, and Predictability Based on the German Reference Corpus

COGNITIVE SCIENCE(2022)

引用 5|浏览0
暂无评分
摘要
In a recent article, Meylan and Griffiths (Meylan & Griffiths, 2021, henceforth, M&G) focus their attention on the significant methodological challenges that can arise when using large-scale linguistic corpora. To this end, M&G revisit a well-known result of Piantadosi, Tily, and Gibson (2011, henceforth, PT&G) who argue that average information content is a better predictor of word length than word frequency. We applaud M&G who conducted a very important study that should be read by any researcher interested in working with large-scale corpora. The fact that M&G mostly failed to find clear evidence in favor of PT&G's main finding motivated us to test PT&G's idea on a subset of the largest archive of German language texts designed for linguistic research, the German Reference Corpus consisting of similar to 43 billion words. We only find very little support for the primary data point reported by PT&G.
更多
查看译文
关键词
Compression,Corpus linguistics,Information theory,Large-scale corpora,N-gram modeling,Uniform information density
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要