Computationally Feasible Near-Optimal Subset Selection for Linear Regression under Measurement Constraints

arXiv: Machine Learning(2016)

引用 23|浏览39
暂无评分
摘要
Computationally feasible and statistically near-optimal subset selection strategies are derived to select a small portion of design (data) points in a linear regression model $y=Xbeta+varepsilon$ to reduce measurement cost and data efficiency. We consider two subset selection algorithms for estimating model coefficients $beta$: the first algorithm is a random subsampling based method that achieves optimal statistical performance with a small $(1+epsilon)$ relative factor under the with replacement model, and an $O(log k)$ multiplicative factor under the without replacement model, with $k$ denoting the measurement budget. The second algorithm is fully deterministic and achieves $(1+epsilon)$ relative approximation under the without replacement model, at the cost of slightly worse dependency of $k$ on the number of variables (data dimension) in the linear regression model. Finally, we show how our method could be extended to the corresponding prediction problem and also remark on interpretable sampling (selection) of data points under random design frameworks.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要