What are they up to ? Distilling the Twitter Stream of Subpopulations

semanticscholar(2018)

引用 0|浏览0
暂无评分
摘要
Social network researchers have been tackling community detection / community search for over a decade. Detecting communities – small groups of people who know each other and interact with each other – have numerous applications, starting from marketing and computational advertisement, all the way to the homeland security domain. By now, the problem can be considered mostly solved, in either its unsupervised form (community detection) or semi-supervised form (community search). In our quest to answer general – and very exciting – questions What are people up to? What do they care about? What are they discussing?, we move beyond detecting communities to circumscribing subpopulations – large groups of people who share some common characteristics, for example activists, students, engineers, New Yorkers, football fans etc. We want to know what are < · · · > talking about on Twitter, where < · · · > is any subpopulation. Initially, the subpopulation is characterized by a few representative members, who are treated as seeds in the iterative Personalized PageRank (PPR) framework that enlarges the subpopulation at each iteration. We immediately hit the scalability limitation, which we overcome by proposing the Splash PPR algorithm, inspired by Splash Belief Propagation. We implement Splash PPR on Apache Spark and show its efficiency and effectiveness on extracting the Twitter stream of a subpopulation of machine learning practitioners, by which we pave the road to distilling valuable signal out of the sea of Twitter noise.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要