Stochastic Neighbor Compression.
ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32(2014)
摘要
We present Stochastic Neighbor Compression (SNC), an algorithm to compress a dataset for the purpose of k -nearest neighbor ( k NN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up k NN testing drastically (up to several orders of magnitude, in our experiments); it makes the k NN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than k NN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over k NN even when kNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction--demonstrating that it is complementary to existing state-of-the-art algorithms to speed up k NN classification and leads to substantial further improvements.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络