Fundamental limits of weak learnability in high-dimensional multi-index models
CoRR(2024)
Abstract
Multi-index models – functions which only depend on the covariates through a
non-linear transformation of their projection on a subspace – are a useful
benchmark for investigating feature learning with neural networks. This paper
examines the theoretical boundaries of learnability in this hypothesis class,
focusing particularly on the minimum sample complexity required for weakly
recovering their low-dimensional structure with first-order iterative
algorithms, in the high-dimensional regime where the number of samples is
n=α d is proportional to the covariate dimension d. Our findings
unfold in three parts: (i) first, we identify under which conditions a
trivial subspace can be learned with a single step of a first-order
algorithm for any α>0; (ii) second, in the case where the trivial
subspace is empty, we provide necessary and sufficient conditions for the
existence of an easy subspace consisting of directions that can be
learned only above a certain sample complexity α>α_c. The
critical threshold α_c marks the presence of a computational phase
transition, in the sense that no efficient iterative algorithm can succeed for
α<α_c. In a limited but interesting set of really hard
directions – akin to the parity problem – α_c is found to diverge.
Finally, (iii) we demonstrate that interactions between different directions
can result in an intricate hierarchical learning phenomenon, where some
directions can be learned sequentially when coupled to easier ones. Our
analytical approach is built on the optimality of approximate message-passing
algorithms among first-order iterative methods, delineating the fundamental
learnability limit across a broad spectrum of algorithms, including neural
networks trained with gradient descent.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined