Efficient Skyline Frequent-Utility Itemset Mining Algorithm on Massive Data.

IEEE Trans. Knowl. Data Eng.(2024)

引用 0|浏览9
暂无评分
摘要
Frequent itemset mining (FIM) and high-utility itemset mining (HUIM) are two important branches of itemset mining which is a key technology of knowledge discovery in many applications. Nowadays, there have been extensive algorithms on FIM and HUIM, but few studies consider frequency and utility together, so skyline frequent-utility itemset mining (SFUIM) is proposed to find useful itemsets with both frequency and utility measurements. Nevertheless, SFUIM is more challenging than FIM and HUIM since the search space is large and the calculation cost is expensive without any threshold, especially on large-scale databases. To address it, this paper proposes a novel prefix-based algorithm PSI* to mine skyline frequent-utility itemsets on massive data. PSI* divides the huge database by prefix-based partitioning, so that the calculation of itemsets with a specific prefix-item only involves a partition instead of the database. A multilevel-index based list is presented to compactly maintain the maximal utility under the frequency constraint, and a novel grid-based structure is devised to organize partitions or items by a designed order. Moreover, four efficient pruning strategies are proposed to prune itemsets as early as possible. Substantial experiments show that the PSI* algorithm has better performance than the state-of-the-art algorithms, obviously on large-scale databases.
更多
查看译文
关键词
Skyline frequent-utility itemset mining,massive data,prefix-based partitioning,grid storage,multilevel-index based list
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要