Automatic variable selection in a linear model on massive data

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION(2022)

引用 0|浏览1
暂无评分
摘要
For a linear model on massive data, we propose an aggregated estimator depending on adaptive LASSO estimators. The proposed method allows the reduction of the data storage volume and the introduction of an aggregates estimator which automatically selects, with a probability converging to one, the significant explanatory variables. Moreover, the aggregated estimator, corresponding to the non null true parameters has the same asymptotic Normal law as the adaptive LASSO estimator on the all data. But, the estimator calculated on all data is practically impossible to calculate, for lack of calculation memory or storage, when the model is on massive data. Then, another interest of our method is that it can work around the data processing problem of insufficient memory allocated by statistical software when the observation number is very large. The empirical performance is investigated by a comparative simulation study. A real data example is used to illustrate the usefulness of our method.
更多
查看译文
关键词
Adaptive LASSO, Aggregated estimator, Oracle properties, Storage volume reduction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要