Optimization and Adaptive Generalization of Three layer Neural Networks

International Conference on Learning Representations (ICLR)(2022)

引用 1|浏览24
暂无评分
摘要
While there has been substantial recent work studying generalization of neural networks, the ability of deep nets in automating the process of feature extraction still evades a thorough mathematical understanding. As a step toward this goal, we analyze learning and generalization of a three-layer neural network with ReLU activations in a regime that goes beyond the linear approximation of the network, and is hence not captured by the common Neural Tangent Kernel. We show that despite nonconvexity of the empirical loss, a variant of SGD converges in polynomially many iterations to a good solution that generalizes. In particular, our generalization bounds are adaptive: they automatically optimize over a family of kernels that includes the Neural Tangent Kernel, to provide the tightest bound.
更多
查看译文
关键词
deep learning theory,adaptive kernel,robust deep learning,neural tangent kernel,adaptive generalization,non-convex optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要