One-Pass Additive-Error Subset Selection for ℓ _p Subspace Approximation and ( k , p )-Clustering

Algorithmica(2023)

引用 0|浏览4
暂无评分
摘要
We consider the problem of subset selection for ℓ _p subspace approximation and ( k , p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For ℓ _p subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of p ∈ [1, ∞ ) , require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ℓ _p subspace approximation, for any p ∈ [1, ∞ ) . Earlier subset selection algorithms that give a one-pass multiplicative (1+ϵ ) approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative (1+ϵ ) approximation guarantee for the special case of ℓ _2 subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with (1+ϵ ) approximation guarantee for ℓ _p subspace approximation when p ∈{1, 2} . Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p ∈ [1, ∞ ) . We also focus on ( k , p )-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p ) is minimized for p∈ [1, ∞ ) . The subset selection algorithms are based on D^p sampling due to Wei (NIPS’16, 2016) which is an extension of D^2 sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the D^p sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for ( k , p )-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k -means for p=2 ; however, our result gives a solution for a more generic problem when p ∈ [1,∞ ) . At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability.
更多
查看译文
关键词
Subspace approximation,k-means clustering,Subset selection.
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要