Anytime Frequent Itemset Mining Of Transactional Data Streams

BIG DATA RESEARCH(2020)

引用 10|浏览12
暂无评分
摘要
Mining frequent itemsets from transactional data streams has become very essential in today's world with many applications such as stock market analysis, retail chain analysis, web log analysis, etc. Various algorithms have been proposed to efficiently mine single-port and multi-port transactional streams within the constraints of limited time and memory. However, all of them are budget algorithms, i.e., they are not capable of handling varying inter-arrival rate of transactions and high speed streams. They are constrained by a maximum limit to the inter-arrival rate of transactions, beyond which they fail to process. Also, these algorithms are not capable of giving immediate mining results, even with compromised accuracy if required. The above two properties characterize an anytime algorithm. We propose ANYFI, which is the first anytime frequent itemset mining algorithm for data streams. ANYFI uses a novel data structure BFI-FOREST, which is capable of handling transactions arriving at variable rate. It maintains itemsets in BFI-forest in such a way that it can give a mining result almost immediately when the time allowance to mine is very less and can refine its accuracy with increase in time allowance. We also propose MPANYFI which extends ANYFI into a parallel framework for anytime frequent itemset mining of multi-port data streams over commodity clusters. It uses ANYFI at each computing node of the cluster. Our extensive experimental analysis shows that ANYFI can handle high stream speeds close to 60,000 trans/sec with recall close to 100%. They also show the efficiency of MPANYFI. (C) 2020 Elsevier Inc. All rights reserved.
更多
查看译文
关键词
Data streams, Frequent itemset mining, Anytime frequent itemset mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要