The Children’s Picture Books Lexicon (CPB-L ex ): A large-scale lexical database from children’s picture books

Behavior research methods(2023)

引用 0|浏览4
暂无评分
摘要
This article presents cpb - lex , a large-scale database of lexical statistics derived from children’s picture books (age range 0–8 years). Such a database is essential for research in psychology, education and computational modelling, where rich details on the vocabulary of early print exposure are required. C pb - lex was built through an innovative method of computationally extracting lexical information from automatic speech-to-text captions and subtitle tracks generated from social media channels dedicated to reading picture books aloud. It consists of approximately 25,585 types (wordforms) and their frequency norms (raw and Zipf-transformed), a lexicon of bigrams (two-word sequences and their transitional probabilities) and a document-term matrix (which shows the importance of each word in the corpus in each book). Several immediate contributions of cpb - lex to behavioural science research are reported, including that the new cpb - lex frequency norms strongly predict age of acquisition and outperform comparable child-input lexical databases. The database allows researchers and practitioners to extract lexical statistics for high-frequency words which can be used to develop word lists. The paper concludes with an investigation of how cpb - lex can be used to extend recent modelling research on the lexical diversity children receive from picture books in addition to child-directed speech. Our model shows that the vocabulary input from a relatively small number of picture books can dramatically enrich vocabulary exposure from child-directed speech and potentially assist children with vocabulary input deficits. The database is freely available from the Open Science Framework repository: https://tinyurl.com/4este73c .
更多
查看译文
关键词
Lexical database,Child input norms,Picture books,Early print exposure,Age of acquisition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要