An Effective Clustering Method over CF$^+$+ Tree Using Multiple Range Queries

Periodicals(2020)

引用 5|浏览56
暂无评分
摘要
AbstractMany existing clustering methods usually compute clusters from the reduced data sets obtained by summarizing the original very large data sets. BIRCH is a popular summary-based clustering method that first builds a CF tree, and then performs a global clustering using the leaf entries of the tree. However, to the best of our knowledge, no prior studies have proposed a global clustering method that uses the structure of a CF tree. Therefore, we propose a novel global clustering method ERC (effective multiple range queries-based clustering), which takes advantage of the structure of a CF tree. We further propose a CF$^+$+ tree, which optimizes the node split scheme used in the CF tree. As a result, the CF$^+$+-ERC (CF$^+$+ tree-based ERC) method effectively computes clusters over large data sets. Furthermore, it does not require a predefined number of clusters to compute the clusters. We present in-depth theoretical and experimental analyses of our method. Experimental results on very large synthetic data sets demonstrate that the proposed approach is effective in terms of cluster quality and robustness and is significantly faster than existing clustering methods. In addition, we apply our clustering method to real data sets and achieve promising results.
更多
查看译文
关键词
Clustering, BIRCH, CF tree, CF+ tree, range query, very large data sets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要