A Continuous-Time View of Early Stopping for Least Squares Regression.

arXiv: Machine Learning(2019)

引用 85|浏览55
暂无评分
摘要
We study the statistical properties of the iterates generated by gradientdescent, applied to the fundamental problem of least squares regression. Wetake a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. Our primary focus is to compare the risk of gradient flow to that ofridge regression. Under the calibration $t=1/lambda$---where $t$ is the timeparameter in gradient flow, and $lambda$ the tuning parameter in ridgeregression---we prove that the risk of gradient flow is no less than 1.69 timesthat of ridge, along the entire path (for all $t geq 0$). This holds in finitesamples with very weak assumptions on the data model (in particular, with noassumptions on the features $X$). We prove that the same relative riskbound holds for prediction risk, in an average sense over the underlyingsignal $beta_0$. Finally, we examine limiting risk expressions (under standard Marchenko-Pastur asymptotics), and give supporting numerical experiments.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要