Does corpus size influence normalised frequencies?

crossref(2024)

引用 0|浏览0
暂无评分
摘要
Several frequency-based corpus linguistic measures are strongly influenced by corpus size (e.g. measures of lexical diversity or text similarity metrics). It is largely unquestioned, however, that normalised frequencies are supposed to correct for the influence of corpus size—but it has not yet been systematically tested whether and how they might be influenced by corpus size themselves. We approached this question by testing the association between lists of normalised frequencies derived from corpus samples of different sizes from five typologically diverse languages. Our results suggest that the size of the underlying corpora does not negatively influence comparisons of normalized frequency lists, i.e. differences in corpus size do not lead to lower associations between the derived lists. Rather, the findings indicate that larger corpora (even in combination with smaller corpora) always have a positive effect in the sense that the association of the compared lists increases. For types in lower frequency ranges, however, these associations decrease rather quickly.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要