Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs
arxiv(2024)
摘要
Vision tasks are characterized by the properties of locality and translation
invariance. The superior performance of convolutional neural networks (CNNs) on
these tasks is widely attributed to the inductive bias of locality and weight
sharing baked into their architecture. Existing attempts to quantify the
statistical benefits of these biases in CNNs over locally connected
convolutional neural networks (LCNs) and fully connected neural networks (FCNs)
fall into one of the following categories: either they disregard the optimizer
and only provide uniform convergence upper bounds with no separating lower
bounds, or they consider simplistic tasks that do not truly mirror the locality
and translation invariance as found in real-world vision tasks. To address
these deficiencies, we introduce the Dynamic Signal Distribution (DSD)
classification task that models an image as consisting of k patches, each of
dimension d, and the label is determined by a d-sparse signal vector that
can freely appear in any one of the k patches. On this task, for any
orthogonally equivariant algorithm like gradient descent, we prove that CNNs
require Õ(k+d) samples, whereas LCNs require Ω(kd) samples,
establishing the statistical advantages of weight sharing in translation
invariant tasks. Furthermore, LCNs need Õ(k(k+d)) samples, compared
to Ω(k^2d) samples for FCNs, showcasing the benefits of locality in
local tasks. Additionally, we develop information theoretic tools for analyzing
randomized algorithms, which may be of interest for statistical research.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要