谷歌浏览器插件
订阅小程序
在清言上使用

Gradient Descent Converges Linearly for Logistic Regression on Separable Data

ICLR 2023(2023)

引用 0|浏览23
暂无评分
摘要
We show that running gradient descent on the logistic regression objective guarantees loss $f(x) \leq 1.1 \cdot f(x^*) + \epsilon$, where the error $\epsilon$ decays exponentially with the number of iterations. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. For separable data, our analysis proves that the error between the predictor returned by gradient descent and the hard SVM predictor decays as $\mathrm{poly}(1/t)$, exponentially faster than the previously known bound of $O(\log\log t / \log t)$. Our key observation is a property of the logistic loss that we call multiplicative smoothness and is (surprisingly) little-explored: As the loss decreases, the objective becomes (locally) smoother and therefore the learning rate can increase. Our results also extend to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.
更多
查看译文
关键词
logistic regression,gradient descent,sparse optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要