A Semi Brute-Force Search Approach for (Balanced) Clustering

Algorithmica(2024)

引用 0|浏览12
暂无评分
摘要
Clustering is one of the most long-standing fundamental problems in the fields of computational geometry and algorithm design. In this paper, we focus on the variance-based clustering problems, included in which is the widely known k -means clustering. As the main contribution, a so-called semi brute-force search approach is proposed and analyzed from both theoretical and experimental aspects. The proposed approach samples a small percentage from the input dataset and search in a brute-force way for a k sized seed whose resulting Voronoi Diagram gives a good clustering of the original dataset. With high probability, the clustering is provably good to estimate the optimum under certain assumptions. Extensive experiments on both synthetic datasets and real-world datasets show that to obtain competitive results compared with k -means method (Llyod in IEEE Trans Inf Theory 28(2):129–137, 1982) and k -means++ method (Arthur and Vassilvitskii, (in: 18th ACM-SIAM symposium on discrete algorithms (SODA), 2007)), we only need a subset of 7% size comparing with the input dataset. If we are allowed to sample 15% from the dataset, our algorithm outperforms both the k -means method and k -means++ method in at least 80% of the clustering tasks. Also, an extended algorithm based on the same idea guarantees a balanced k -clustering result.
更多
查看译文
关键词
Clustering,Balanced clustering,Probabilistic algorithm,k-means,k-means++
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要