Clust-LDA: Joint Model for Text Mining and Author Group Inference

Shaoyang Ning, Xi Qu, Victor Cai,Nathan Sanders

arXiv: Information Retrieval(2018)

引用 23|浏览0
暂无评分
摘要
Social media corpora pose unique challenges and opportunities, including typically short document lengths and rich meta-data such as author characteristics and relationships. This creates great potential for systematic analysis of the enormous body of the users and thus provides implications for industrial strategies such as targeted marketing. Here we propose a novel and statistically principled method, clust-LDA, which incorporates authorship structure into the topical modeling, thus accomplishing the task of the topical inferences across documents on the basis of authorship and, simultaneously, the identification of groupings between authors. We develop an inference procedure for clust-LDA and demonstrate its performance on simulated data, showing that clust-LDA out-performs the "vanilla" LDA on the topic identification task where authors exhibit distinctive topical preference. We also showcase the empirical performance of clust-LDA based on a real-world social media dataset from Reddit.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要