Polynomial-time universality and limitations of deep learning

COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS(2023)

引用 0|浏览7
暂无评分
摘要
The goal of this paper is to characterize function distributions that general neural networks trained by descent algorithms (GD/SGD), can or cannot learn in polytime. The results are: (1) The paradigm of general neural networks trained by SGD is poly-time universal: any function distribution that can be learned from samples in polytime can also be learned by a poly-size neural net trained by SGD with polynomial parameters. In particular, this can be achieved despite polynomial noise on the gradients, implying a separation result between SGD-based deep learning and statistical query algorithms, as the latter are not comparably universal due to cases like parities. This also shows that deep learning does not suffer from the limitations of shallow networks. (2) The paper further gives a lower-bound on the generalization error of descent algorithms, which relies on two quantities: the cross-predictability, an average-case quantity related to the statistical dimension, and the null-flow, a quantity specific to descent algorithms. The lower-bound implies in particular that for functions of low enough cross-predictability, the above robust universality breaks down once the gradients are averaged over too many samples (as in perfect GD) rather than fewer (as in SGD). (3) Finally, it is shown that if larger amounts of noise are added on the initialization and on the gradients, then SGD is no longer comparably universal due again to distributions having low enough cross-predictability.
更多
查看译文
关键词
deep learning,limitations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要