Supervised k-Means Clustering

msra（2008）

引用 39|浏览62

暂无评分

摘要

The k-means clustering algorithm is one of the most widely used, effective, and best understood clustering meth- ods. However, successful use of k-means requires a care- fully chosen distance measure that reflects the properties of the clustering task. Since designing this distance measure by hand is often difficult, we provide methods for training k-means using supervised data. Given training data in the form of sets of items with their desired partitioning, we pro- vide a structural SVM method that learns a distance mea- sure so that k-means produces the desired clusterings1. We propose two variants of the methods - one based on a spec- tral relaxation and one based on the traditional k-means al- gorithm - that are both computationally efficient. For each variant, we provide a theoretical characterization of its ac- curacy in solving the training problem. We also provide an empirical clustering quality and runtime analysis of these learning methods on varied high-dimensional datasets. cluster articles which are about the same story, as opposed to other criteria. Unfortunately, hand-tuning the similarity measure for specific tasks as these is difficult, since it is unclear how changes in the similarity measure relate to the behavior of the k-means algorithm. In this paper we propose a supervised learning approach to finding a similarity measure so that k-means provides the desired clusterings for the task at hand. Given train- ing examples of item sets with their correct clusterings, the goal is to learn a similarity measure so that future sets of items are clustered in a similar fashion. In particular, we provide a structural support vector machine (SSVM) algo- rithm for this supervised k-means learning problem, capa- ble of directly optimizing a parameterized similarity mea- sure to maximize cluster accuracy. We show theoretically and empirically that the algorithm is efficient, and that it provides improved clustering accuracy compared to non- learning methods, as well as compared to more conven- tional approaches to this supervised clustering problem.

查看译文

关键词

machine learning,computer science,clustering,k means

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要