Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency
CoRR(2024)
Abstract
We consider gradient descent (GD) with a constant stepsize applied to
logistic regression with linearly separable data, where the constant stepsize
η is so large that the loss initially oscillates. We show that GD exits
this initial oscillatory phase rapidly – in 𝒪(η) steps – and
subsequently achieves an 𝒪̃(1 / (η t) ) convergence rate
after t additional steps. Our results imply that, given a budget of T
steps, GD can achieve an accelerated loss of 𝒪̃(1/T^2) with
an aggressive stepsize η:= Θ( T), without any use of momentum or
variable stepsize schedulers. Our proof technique is versatile and also handles
general classification loss functions (where exponential tails are needed for
the 𝒪̃(1/T^2) acceleration), nonlinear predictors in the
neural tangent kernel regime, and online stochastic gradient descent (SGD) with
a large stepsize, under suitable separability conditions.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined