Statistically Optimal K-means Clustering via Nonnegative Low-rank Semidefinite Programming
ICLR 2024(2023)
摘要
K-means clustering is a widely used machine learning method for identifying
patterns in large datasets. Semidefinite programming (SDP) relaxations have
recently been proposed for solving the K-means optimization problem that
enjoy strong statistical optimality guarantees, but the prohibitive cost of
implementing an SDP solver renders these guarantees inaccessible to practical
datasets. By contrast, nonnegative matrix factorization (NMF) is a simple
clustering algorithm that is widely used by machine learning practitioners, but
without a solid statistical underpinning nor rigorous guarantees. In this
paper, we describe an NMF-like algorithm that works by solving a nonnegative
low-rank restriction of the SDP relaxed K-means formulation using a nonconvex
Burer–Monteiro factorization approach. The resulting algorithm is just as
simple and scalable as state-of-the-art NMF algorithms, while also enjoying the
same strong statistical optimality guarantees as the SDP. In our experiments,
we observe that our algorithm achieves substantially smaller mis-clustering
errors compared to the existing state-of-the-art.
更多查看译文
关键词
clustering,Burer-Monteiro,semidefinite programming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要