NALA: A Nesterov Accelerated Look-Ahead optimizer for deep neural networks

Xuan Zuo,Pu Zhang,Shan Gao, Huiyan Li,Wanru Du

Research Square (Research Square)（2023）

引用 0|浏览3

暂无评分

摘要

Abstract A series of adaptive gradient algorithms, such as Adadelta1, Adam2, AdamW3, Adan4 and AdaXod5, has been successfully used in training deep neural networks (DNN). Previous work reveals that adaptive gradient algorithms mainly borrow the moving average idea of heavy ball acceleration to estimate the first- and second-order moments of gradient for accelerating convergence4. However, Nesterov acceleration which uses the gradient at extrapolation point can achieve a faster convergence speed than heavy ball acceleration in theory. In this paper, a new optimization algorithm which combines adaptive gradient algorithm with Nesterov acceleration by using a look-ahead scheme, called NALA, is proposed for deep learning. NALA iteratively updates two sets of weights, i.e., the ‘fast weights’ in its inner loop and the ‘slow weights’ in its outer loop. Concretely, NALA first updates the fast weights k times using Adam optimizer in the inner loop, and then updates the slow weights once in the direction of Nesterov’s Accelerated Gradient (NAG) in the outer loop. We compare NALA with several popular optimization algorithms on a range of image classification tasks on public datasets. The experimental results show that NALA can achieve faster convergence and higher accuracy than other popular optimization algorithms.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要