Highly Scalable Sequential Pattern Mining Based on MapReduce Model on the Cloud

Big Data(2013)

引用 25|浏览2
暂无评分
摘要
Sequential pattern mining is an essential data mining technique that has been widely applied to many real world applications. However, traditional algorithms generally suffer from the scalability problem when dealing with big data. In this paper, we aim to significantly upgrade the scale and propose Sequential PAttern Mining algorithm based on MapReduce model on the Cloud (abbreviated as SPAMC). Derived from the prior SPAM algorithm, we design an iterative MapReduce framework to efficiently generate and prune candidate patterns when constructing the lexical sequence tree. This framework not only distributes the sub-tasks of tree construction to independent mappers in parallel, but also enables the parallel processing of support counting. We conduct extensive experiments on the cloud environment of 32 virtual machines with up to 12.8 million transactional sequences. Experimental results show that SPAMC can significantly reduce mining time with big data, achieve extremely high scalability, and provide perfect load balancing on the cloud cluster.
更多
查看译文
关键词
mapreduce framework,mining time,parallel processing,highly scalable sequential pattern,lexical sequence tree,big data,candidate pattern generation,candidate pattern pruning,trees (mathematics),transactional sequences,spamc,virtual machines,sequential pattern mining,resource allocation,mapreduce model,support counting,cloud cluster,high scalability,scalability problem,data mining,cloud environment,essential data mining technique,data mining technique,cloud computing,sequential pattern mining algorithm,load balancing,iterative mapreduce framework
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要