An Investigation of Prior Specification on Parameter Recovery for Latent Dirichlet Allocation of Constructed-Response Items

Quantitative Psychology(2022)

Cited 0|Views4
No score
Abstract
Latent Dirichlet Allocation (LDA) is a probabilistic model to analyze textual data. It was originally developed for corpora containing large amount of textual data, such as large sets of journal abstracts, blogs, and newspaper articles. Recently, LDA has been applied in psychological and educational measurement fields to analyze examinees’ responses to open-ended items on assessments. The amount of textual data found in educational measurement scenarios, however, is notably less than the amount of data originally used for LDA. The observed data, therefore, may not be enough to accurately recover the parameters. Thus, it is important to explore how various priors influence the parameter recovery of the LDA model. In this study, we investigated the effects of prior hyperparameters parameter on recovery through a simulation using various conditions that are common in educational assessment settings. Specifically, five sets of priors ranging from highly informative to noninformative were used. For each set of priors, four factors were manipulated and all factors were crossed for a total of 108 conditions. The four factors used in this study were: number of unique words (3 levels: 250, 500, and 750 words), average response length (3 levels: 5, 25, and 50 words per document), number of documents (3 levels: 100, 250, and 500 documents), and number of topics (3 levels: 3, 4, and 5 topics). The results of the simulation showed that the prior specification of the LDA model influenced the parameter recovery rates.
More
Translated text
Key words
Educational topic models, Model recovery, Simulation study
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined