Improving Scalability of the Nested Partition-Based Clustering

Architecture and Civil Engineering 2015(2015)

引用 0|浏览0
暂无评分
摘要
Many researchers have endeavored to improve scalability of cluster- ing algorithm in the data mining field, since the induction of data mining mod- els generally takes longer as data size increases even though computer systems become capable of calculating much faster. Thus, the scalability is naturally the critical issue that the data mining community faces. One of methods to handle this problem is to use a part of all data. Another scalable approach is to use an efficient search technique for finding a promising region that may have a good solution. In this paper we investigate how to improve scalability of the nested partition (NP) based clustering algorithm. With respect to scalability of the NP based algorithm, it is important to reduce the number of backtrackings which arise to modify incorrect partitioning moves. In the NP framework, we take so- lution samples from each partitioning region for evaluating the performance of solutions. If the variance of solution performances is large, it indicates the parti- tion may be wrong so that the algorithm go back to the previous parent region for correcting the false partitioning move. Thus we employ the sampling scheme that can reduce the variance of solution samples. Then we show that the NP based clustering algorithm can be scalable by the solution sampling scheme that solves the noisy performance problems.
更多
查看译文
关键词
clustering,scalability,partition-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要