A novel cluster validity index based on augmented non-shared nearest neighbors

SSRN Electronic Journal(2023)

引用 3|浏览12
暂无评分
摘要
The true cluster number of the dataset in practical applications is rarely known in advance. Therefore, it is necessary to use a cluster validity index to evaluate the clustering results and determine the optimal cluster number. However, the performance of existing cluster validity indices is vulnerable to various factors such as cluster shape and density. To solve the above issues, this paper proposes a new cluster validity index based on augmented non-shared nearest neighbors (ANCV). The ANCV index is based on the following principles: (1) Within-cluster compactness can be measured by the distance between the pairs of data points with fewer shared nearest neighbors. (2) The distances between the pairs of data points at the intersection of clusters can be used to estimate the between-cluster separation. On this basis, the above point pairs are further extended to their augmented non-shared nearest neighbors, thereby forming small clusters. Then, the average distance within and between these clusters is calculated respectively to estimate the within-cluster compactness and between-cluster separation. Finally, the optimal number of clusters is determined by the difference between the between-cluster separation and the within-cluster compactness. Experimental results on both 12 two-dimensional synthetic datasets and 10 real datasets from UCI have shown that the ANCV index performs the best among all compared indices.
更多
查看译文
关键词
Validity index,Within -cluster compactness,Between -cluster separation,Shared nearest neighbors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要