A heuristic approach for load balancing the FP-growth algorithm on MapReduce.

Sikha Bagui, Keerthi Devulapalli,John Coffey

Array(2020)

引用 10|浏览5
暂无评分
摘要
Frequent itemset discovery is an important step in Association Rule Mining. The Frequent Pattern (FP) growth algorithm, often used for discovering frequent itemsets, cannot scale directly to today’s Big Data, especially for large sparse datasets. Hence there is a need to distribute and parallelize the FP-growth algorithm. Parallel FP-growth (PFP) is a parallel implementation of the FP-growth algorithm on Hadoop’s MapReduce execution framework. Though PFP scales to large datasets, it suffers from imbalanced load across processing units. In this paper we propose a heuristic based, lower order of complexity, load balancing strategy for the PFP algorithm, called Heuristic Based PFP (HBPFP). Our results show that HBPFP distributes the load more evenly across the Hadoop cluster nodes, runs faster than the PFP algorithm, and uses cluster resources more efficiently, especially for large sparse datasets.
更多
查看译文
关键词
Association rule mining,Frequent pattern growth algorithm,Load balancing,MapReduce,Hadoop
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要