Scalable mining of social data using stochastic gradient fisher scoring.

CIKM(2013)

引用 0|浏览17
暂无评分
摘要
ABSTRACTThe rapid growth of social data in the form of videos, microblog posts and other items shared on social media presents new opportunities for learning user behavior and preferences. Bayesian models have been used widely for modeling social data, since they capture uncertainty and prior knowledge, avoid overfitting, and can be easily extended to incorporate new types of data. Researchers have used a variety of inference procedures to learn model parameters from data. Specifically, Stochastic Gradient Fisher Scoring (SGFS) method was recently proposed for efficient inference. This method samples from a Bayesian posterior using small number of data samples in each iteration, instead of the entire data, to speed up the inference process. In this paper we explore the feasibility of SGFS for social data mining. We find that SGFS often outperforms other inference methods in dense data, but it fails in the sparse "long-tail" where there are not enough instances for it to learn parameters. This is problematic, because social data often has long-tailed distribution. To address this problem, we propose hybrid SGFS (hSGFS) and evaluate its performance on a variety of social data sets. We find that hSGFS is better able to predict held out items in data sets that have a long-tailed distribution.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要