Optimization and Adaptive Generalization of Three layer Neural Networks
International Conference on Learning Representations (ICLR)(2022)
摘要
While there has been substantial recent work studying generalization of neural networks,
the ability of deep nets in automating the process of feature extraction still evades a thorough mathematical understanding.
As a step toward this goal, we analyze learning and generalization of a three-layer neural network with ReLU activations in a regime that goes beyond the linear approximation of the network, and is hence not captured by the common Neural Tangent Kernel. We show that despite nonconvexity of the empirical loss, a variant of SGD converges in polynomially many iterations to a good solution that generalizes. In particular, our generalization bounds are adaptive: they automatically optimize over a family of kernels that includes the Neural Tangent Kernel, to provide the tightest bound.
更多查看译文
关键词
deep learning theory,adaptive kernel,robust deep learning,neural tangent kernel,adaptive generalization,non-convex optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要