Optimal sampling for stochastic and natural gradient descent
arxiv(2024)
Abstract
We consider the problem of optimising the expected value of a loss functional
over a nonlinear model class of functions, assuming that we have only access to
realisations of the gradient of the loss. This is a classical task in
statistics, machine learning and physics-informed machine learning. A
straightforward solution is to replace the exact objective with a Monte Carlo
estimate before employing standard first-order methods like gradient descent,
which yields the classical stochastic gradient descent method. But replacing
the true objective with an estimate ensues a “generalisation error”. Rigorous
bounds for this error typically require strong compactness and Lipschitz
continuity assumptions while providing a very slow decay with sample size. We
propose a different optimisation strategy relying on a natural gradient descent
in which the true gradient is approximated in local linearisations of the model
class via (quasi-)projections based on optimal sampling methods. Under
classical assumptions on the loss and the nonlinear model class, we prove that
this scheme converges almost surely monotonically to a stationary point of the
true objective and we provide convergence rates.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined