Corrected generalized cross-validation for finite ensembles of penalized estimators
arxiv(2023)
摘要
Generalized cross-validation (GCV) is a widely-used method for estimating the
squared out-of-sample prediction risk that employs a scalar degrees of freedom
adjustment (in a multiplicative sense) to the squared training error. In this
paper, we examine the consistency of GCV for estimating the prediction risk of
arbitrary ensembles of penalized least-squares estimators. We show that GCV is
inconsistent for any finite ensemble of size greater than one. Towards
repairing this shortcoming, we identify a correction that involves an
additional scalar correction (in an additive sense) based on degrees of freedom
adjusted training errors from each ensemble component. The proposed estimator
(termed CGCV) maintains the computational advantages of GCV and requires
neither sample splitting, model refitting, or out-of-bag risk estimation. The
estimator stems from a finer inspection of the ensemble risk decomposition and
two intermediate risk estimators for the components in this decomposition. We
provide a non-asymptotic analysis of the CGCV and the two intermediate risk
estimators for ensembles of convex penalized estimators under Gaussian features
and a linear response model. Furthermore, in the special case of ridge
regression, we extend the analysis to general feature and response
distributions using random matrix theory, which establishes model-free uniform
consistency of CGCV.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要