Improving cross-validated bandwidth selection using subsampling-extrapolation techniques

Computational Statistics & Data Analysis(2015)

引用 7|浏览33
暂无评分
摘要
Cross-validation methodologies have been widely used as a means of selecting tuning parameters in nonparametric statistical problems. In this paper we focus on a new method for improving the reliability of cross-validation. We implement this method in the context of the kernel density estimator, where one needs to select the bandwidth parameter so as to minimize L 2 risk. This method is a two-stage subsampling-extrapolation bandwidth selection procedure, which is realized by first evaluating the risk at a fictional sample size m ( m ¿ sample¿size¿ n ) and then extrapolating the optimal bandwidth from m to n . This two-stage method can dramatically reduce the variability of the conventional unbiased cross-validation bandwidth selector. This simple first-order extrapolation estimator is equivalent to the rescaled \"bagging-CV\" bandwidth selector in Hall and Robinson (2009) if one sets the bootstrap size equal to the fictional sample size. However, our simplified expression for the risk estimator enables us to compute the aggregated risk without any bootstrapping. Furthermore, we developed a second-order extrapolation technique as an extension designed to improve the approximation of the true optimal bandwidth. To select the optimal choice of the fictional size m given a sample of size n , we propose a nested cross-validation methodology. Based on simulation study, the proposed new methods show promising performance across a wide selection of distributions. In addition, we also investigated the asymptotic properties of the proposed bandwidth selectors. A two-stage subsampling-extrapolation bandwidth selection procedure is proposed.An automatic nested cross-validation method is developed to select the subsample size.The extrapolated bandwidth selectors achieve a smaller mean square error.The second-order extrapolated bandwidth selector has a relative convergence rate n - 1 / 4 .
更多
查看译文
关键词
Bandwidth selection,Cross-validation,Extrapolation,L2 distance,Nonparametric kernel density estimator,Subsampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要