On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width
CoRR(2023)
Abstract
Second-order optimization has been developed to accelerate the training of
deep neural networks and it is being applied to increasingly larger-scale
models. In this study, towards training on further larger scales, we identify a
specific parameterization for second-order optimization that promotes feature
learning in a stable manner even if the network width increases significantly.
Inspired by a maximal update parameterization, we consider a one-step update of
the gradient and reveal the appropriate scales of hyperparameters including
random initialization, learning rates, and damping terms. Our approach covers
two major second-order optimization algorithms, K-FAC and Shampoo, and we
demonstrate that our parameterization achieves higher generalization
performance in feature learning. In particular, it enables us to transfer the
hyperparameters across models with different widths.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined