Home

Awesome

Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network, Arxiv 2023

Spikingformer is a pure event-driven transformer-based spiking neural network (75.85% top-1 accuracy on ImageNet-1K, + 1.04% and significantly reduces energy consumption by 57.34% compared with Spikformer). To our best knowledge, this is the first time that a pure event-driven transformer-based SNN has been developed in 2023/04.

<p align="center"> <img src="https://github.com/zhouchenlin2096/Spikingformer/blob/master/imgs/Spikingformer-Architecture.png"> </p>

News

[2024.2.23] Update energy_consumption_calculation of Spikingformer or Spikformer on ImageNet.

[2023.9.11] Update origin_logs and cifar10 trained model.

[2023.8.18] Update trained models.

Reference

If you find this repo useful, please consider citing:

@article{zhou2023spikingformer,
  title={Spikingformer: Spike-driven Residual Learning for Transformer-based Spiking Neural Network},
  author={Zhou, Chenlin and Yu, Liutao and Zhou, Zhaokun and Zhang, Han and Ma, Zhengyu and Zhou, Huihui and Tian, Yonghong},
  journal={arXiv preprint arXiv:2304.11954},
  year={2023},
  url={https://arxiv.org/abs/2304.11954}
}

@article{zhou2024direct,
  title={Direct training high-performance deep spiking neural networks: a review of theories and methods},
  author={Zhou, Chenlin and Zhang, Han and Yu, Liutao and Ye, Yumin and Zhou, Zhaokun and Huang, Liwei and Ma, Zhengyu and Fan, Xiaopeng and Zhou, Huihui and Tian, Yonghong},
  journal={Frontiers in Neuroscience},
  volume={18},
  pages={1383844},
  year={2024},
  publisher={Frontiers Media SA}
}

Main results on ImageNet-1K

ModelResolutionTParam.FLOPsPowerTop-1 AccDownload
Spikingformer-8-384224x224416.81M3.88G4.69 mJ72.45-
Spikingformer-8-512224x224429.68M6.52G7.46 mJ74.79-
Spikingformer-8-768224x224466.34M12.54G13.68 mJ75.85here

All download passwords: abcd

<!-- | Spikformer-8-384 | 224x224 | 16.81M | 6.82G | 12.43 mJ |70.24 | | Spikformer-8-512 | 224x224 | 29.68M | 11.09G | 18.82 mJ |73.38 | | Spikformer-8-768 | 224x224 | 66.34M | 22.09G | 32.07 mJ |74.81 | -->

Main results on CIFAR10/CIFAR100

ModelTParam.CIFAR10 Top-1 AccDownloadCIFAR100 Top-1 Acc
Spikingformer-4-25644.15M94.77-77.43
Spikingformer-2-38445.76M95.22-78.34
Spikingformer-4-38449.32M95.61-79.09
Spikingformer-4-384-400E49.32M95.81here79.21

All download passwords: abcd

Main results on CIFAR10-DVS/DVS128

ModelTParam.CIFAR10 DVS Top-1 AccDVS 128 Top-1 Acc
Spikingformer-2-256102.57M79.996.2
Spikingformer-2-256162.57M81.398.3

Requirements

timm==0.6.12; cupy==11.4.0; torch==1.12.1; spikingjelly==0.0.0.0.12; pyyaml;

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Train

Training on ImageNet

Setting hyper-parameters in imagenet.yml

cd imagenet
python -m torch.distributed.launch --nproc_per_node=8 train.py

Testing ImageNet Val data

Download the trained model first here, passwords: abcd

cd imagenet
python test.py

Training on CIFAR10

Setting hyper-parameters in cifar10.yml

cd cifar10
python train.py

Training on CIFAR100

Setting hyper-parameters in cifar100.yml

cd cifar10
python train.py

Training on DVS128 Gesture

cd dvs128-gesture
python train.py

Training on CIFAR10-DVS

cd cifar10-dvs
python train.py

Energy Consumption Calculation on ImageNet

Download the trained model first here, passwords: abcd

cd imagenet
python energy_consumption_calculation_on_imagenet.py

A Handwriting Error Correction in Manuscript

In neuromorphic datasets, the preprocessing (transforming events into frames) of neuromorphic datasets is according to SEW or SpikingJelly. The event stream comprises four dimensions: the event’s coordinate (x, y), time (t), and polarity (p). We split the event’s number N into T (the simulating time-step) slices with nearly the same number of events in each slice and integrate events into frames. It is a pity that Equation 20 in the manuscript is a formula mistake, we corrected it as follows: $$E_{Spikingformer}^{neuro}=E_{A C} \times\left(\sum_{i=2}^N S O P_{{Conv} }^i+\sum_{j=1}^M S O P_{{SSA}}^j\right)+E_{M A C} \times\left(FLOP_{{Conv}}^1\right)$$

Acknowledgement & Contact Information

Related project: spikformer, pytorch-image-models, CML, spikingjelly.

For help or issues using this git, please submit a GitHub issue.

For other communications related to this git, please contact zhouchl@pcl.ac.cn or zhouchenlin19@mails.ucas.ac.cn.