On Discovering the Number of Document Topics via Conceptual Latent Space.
CIKM(2017)
摘要
Topic modeling is a widely used technique in knowledge discovery and data mining. However, finding the right number of topics in a given text source has remained a challenging issue. In this paper, we study the concept of conceptual stability via nonnegative matrix factorization. Based on this finding, we propose a method to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the text sources. Experiments on real-world text corpora demonstrate that the proposed method has outperformed state-of-the-art latent Dirichlet allocation and nonnegative matrix factorization models.
更多查看译文
关键词
Topic Modeling, Nonnegative Matrix Factorization, Stability Analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络