Simultaneous linear connectivity of neural networks modulo permutation
CoRR(2024)
摘要
Neural networks typically exhibit permutation symmetries which contribute to
the non-convexity of the networks' loss landscapes, since linearly
interpolating between two permuted versions of a trained network tends to
encounter a high loss barrier. Recent work has argued that permutation
symmetries are the only sources of non-convexity, meaning there are essentially
no such barriers between trained networks if they are permuted appropriately.
In this work, we refine these arguments into three distinct claims of
increasing strength. We show that existing evidence only supports "weak linear
connectivity"-that for each pair of networks belonging to a set of SGD
solutions, there exist (multiple) permutations that linearly connect it with
the other networks. In contrast, the claim "strong linear connectivity"-that
for each network, there exists one permutation that simultaneously connects it
with the other networks-is both intuitively and practically more desirable.
This stronger claim would imply that the loss landscape is convex after
accounting for permutation, and enable linear interpolation between three or
more independently trained models without increased loss. In this work, we
introduce an intermediate claim-that for certain sequences of networks, there
exists one permutation that simultaneously aligns matching pairs of networks
from these sequences. Specifically, we discover that a single permutation
aligns sequences of iteratively trained as well as iteratively pruned networks,
meaning that two networks exhibit low loss barriers at each step of their
optimization and sparsification trajectories respectively. Finally, we provide
the first evidence that strong linear connectivity may be possible under
certain conditions, by showing that barriers decrease with increasing network
width when interpolating among three networks.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要