Awesome
State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks
This repo contains the code reproducing the results of STDS (State Transition of Dendritic Spines) in this paper, which is modified based on the open-source code of SEW ResNet.
Directory Tree
.
├── CIFAR10
│ ├── model.py
│ ├── optim.py
│ ├── train.py
│ └── logs
└── ImageNet
├── optim.py
├── sew_resnet.py
├── train.py
├── utils.py
└── logs
├── linear
└── sine
Dependency
The major dependencies of this code are list as below
# Name Version
cudatoolkit 10.2.89
cudnn 8.2.1.32
cupy 9.6.0
numpy 1.21.4
python 3.7.11
pytorch 1.9.1
spikingjelly <Specific Version>
tensorboard 2.7.0
torchvision 0.10.1
Note: the version of spikingjelly will be clarified in usage part.
Environment
The running of code requires NVIDIA GPU and has been tested on CUDA 10.2 and Ubuntu 16.04. The hardware platform used in experiments is shown below.
- GPU: Tesla V100-SXM3-32GB 350 Watts version
- CPU: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
Each trial on ImageNet requires 8 GPUs. For CIFAR-10, each trial requires only a single GPU.
Usage
This code requires a specified version of an open-source SNN framework SpikingJelly. To get this framework installed, first clone the repo from GitHub:
$ git clone https://github.com/fangwei123456/spikingjelly.git
Then, checkout the version we use in these experiments and install it.
$ cd spikingjelly
$ git checkout d8cc6a5
$ python setup.py install
With dependency mentioned above installed, you should be able to run the following commands:
ImageNet
Dense training:
$ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --cos_lr_T 320 --model sew_resnet18 -b 32 --output-dir <log dir> --tb --print-freq 4096 --amp --connect_f ADD --T 4 --lr 0.1 --epoch 320 --data-path <dataset path> --sparse-function identity
Our proposed algorithm:
$ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --cos_lr_T 320 --model sew_resnet18 -b 32 --output-dir <log dir> --tb --print-freq 4096 --amp --connect_f ADD --T 4 --lr 0.1 --epoch 320 --data-path <dataset path> --sparse-function stmod --flat-width <D> --gradual <scheduler type>
Grad R:
$ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --cos_lr_T 320 --model sew_resnet18 -b 32 --output-dir <log dir> --tb --print-freq 4096 --amp --connect_f ADD --T 4 --lr 0.1 --epoch 320 --alpha-gr <alpha in Grad R> --data-path <dataset path> --sparse-function stmod --flat-width <mu in Grad R>
The TensorBoard logs and checkpoints will be placed in two separate directories in ./logs
.
Running Arguments
Arguments | Descriptions | Value | Type |
---|---|---|---|
--cos_lr_T | Total steps of Cosine Annealing scheduler of learning rate | 320 | int |
-b,--batch-size | Training batch size | 32 | int |
--alpha-gr | Hyperparameter $\alpha$ in Grad R | None | float |
--data-path | Path of datasets | str | |
--output-dir | Path for dumping models and logs | str | |
--print-freq | Frequency of print of status during training | 4096 | int |
--amp | Whether to use mixed precision training | bool | |
--connect_f | Connection function of SEW ResNet | ADD | str |
-T | Simulation time-steps of SNNs | 4 | int |
--lr | Learning rate | 0.1 | float |
--epoch | Number of training epochs | 320 | int |
--sparse-function | Reparameterization function | 'stmod' for pruning, 'identity' for training dense model | str |
--flat-width | Hyperparameter $D$ in our work and $\mu$ in Grad R | float | |
--gradual | Scheduler type | 'sine', 'linear' | str |
CIFAR-10
Dense training:
$ python train.py --dataset-dir <dataset path> --dump-dir . --sparse-function identity --amp
Our proposed algorithm:
$ python train.py --dataset-dir <dataset path> --dump-dir . --sparse-function stmod --gradual <scheduler type> --flat-width <D> --amp
Running Arguments
Arguments | Descriptions | Value | Type |
---|---|---|---|
-b, --batch-size | Training batch size | 16 | int |
--lr | Learning rate | 1e-4 | float |
--dataset-dir | Path of datasets | str | |
--dump-dir | Path for dumping models and logs | str | |
-T | Simulation time-steps of SNNs | 8 | int |
-N, --epoch | Number of training epochs | 2048 | int |
-test | Whether test only | bool | |
--amp | Whether to use mixed precision training | bool | |
--sparse-function | Reparameterization function | 'stmod' for pruning, 'identity' for training dense model | str |
--flat-width | Hyperparameter $D$ in our work and $\mu$ in Grad R | float | |
--gradual | Scheduler type | 'sine', 'linear' | str |
Citation
Please refer to the following citation if this work is useful for your research.
@InProceedings{pmlr-v162-chen22ac,
title = {State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks},
author = {Chen, Yanqi and Yu, Zhaofei and Fang, Wei and Ma, Zhengyu and Huang, Tiejun and Tian, Yonghong},
booktitle = {Proceedings of the 39th International Conference on Machine Learning},
pages = {3701--3715},
year = {2022},
editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
volume = {162},
series = {Proceedings of Machine Learning Research},
month = {17--23 Jul},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v162/chen22ac/chen22ac.pdf},
url = {https://proceedings.mlr.press/v162/chen22ac.html}
}