An Integer Programming Approach To Subspace Clustering With Missing Data

arXiv (Cornell University)(2023)

引用 0|浏览1
暂无评分
摘要
In the Subspace Clustering with Missing Data (SCMD) problem, we are given a collection of n partially observed d-dimensional vectors. The data points are assumed to be concentrated near a union of low-dimensional subspaces. The goal of SCMD is to cluster the vectors according to their subspace membership and recover the underlying basis, which can then be used to infer their missing entries. State-of-the-art algorithms for SCMD can fail on instances with a high proportion of missing data, full-rank data, or if the underlying subspaces are similar to each other. We propose a novel integer programming approach for SCMD. The approach is based on dynamically determining a set of candidate subspaces and optimally assigning points to selected subspaces. The problem structure is identical to the classical facility-location problem, with subspaces playing the role of facilities and data points that of customers. We propose a column-generation approach for identifying candidate subspaces combined with a Benders decomposition approach for solving the linear programming relaxation of the formulation. An empirical study demonstrates that the proposed approach can achieve better clustering accuracy than state-of-the-art methods when the data is high-rank, the percentage of missing data is high, or the subspaces are similar.
更多
查看译文
关键词
integer programming approach,subspace
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要