A three-way cluster ensemble approach for large-scale data

International Journal of Approximate Reasoning(2019)

引用 58|浏览24
Cluster ensemble has emerged as a powerful technique for combining multiple clustering results. To address the problem of clustering on large-scale data, this paper presents an efficient three-way cluster ensemble approach based on Spark, which has the ability to deal with both hard clustering and soft clustering. First, this paper proposes the framework of three-way cluster ensemble based on Spark inspired by the theory of three-way decisions, and develops a distributed three-way k-means clustering algorithm. Then, we introduce the concept of cluster unit, which reflects the minimal granularity distribution structure agreed by all the ensemble members. We also introduce quantitative measures for calculating the relationships between units and between clusters. Finally, we propose a consensus clustering algorithm based on cluster units, and we devise various three-way decision strategies to assign small cluster units and no-unit objects. The experimental results using 19 real-world data sets validate the effectiveness of the proposed approach from different indices such as ARI, ACC, NMI and F1-Measure. The experimental results show that the proposed approach can effectively deal with large-scale data, and the proposed consensus clustering algorithm has a lower time cost and does not sacrifice the clustering quality.
Cluster ensemble,Three-way decisions,Large-scale data,Cluster units,Spark
AI 理解论文
Chat Paper