Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
CoRR(2024)
Abstract
Spiking Neural Networks (SNNs), known for their biologically plausible
architecture, face the challenge of limited performance. The self-attention
mechanism, which is the cornerstone of the high-performance Transformer and
also a biologically inspired structure, is absent in existing SNNs. To this
end, we explore the potential of leveraging both self-attention capability and
biological properties of SNNs, and propose a novel Spiking Self-Attention (SSA)
and Spiking Transformer (Spikformer). The SSA mechanism eliminates the need for
softmax and captures the sparse visual feature employing spike-based Query,
Key, and Value. This sparse computation without multiplication makes SSA
efficient and energy-saving. Further, we develop a Spiking Convolutional Stem
(SCS) with supplementary convolutional layers to enhance the architecture of
Spikformer. The Spikformer enhanced with the SCS is referred to as Spikformer
V2. To train larger and deeper Spikformer V2, we introduce a pioneering
exploration of Self-Supervised Learning (SSL) within the SNN. Specifically, we
pre-train Spikformer V2 with masking and reconstruction style inspired by the
mainstream self-supervised Transformer, and then finetune the Spikformer V2 on
the image classification on ImageNet. Extensive experiments show that
Spikformer V2 outperforms other previous surrogate training and ANN2SNN
methods. An 8-layer Spikformer V2 achieves an accuracy of 80.38
steps, and after SSL, a 172M 16-layer Spikformer V2 reaches an accuracy of
81.10
time that the SNN achieves 80+
available at Spikformer V2.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined