Home

Awesome

Fast-SNN

This repo holds the codes for Fast-SNN.

Dependencies

Prepare Quantized ANNs

For training quantized ANNs, we follow the protocol defined in Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

For more details, please refer to APoT_Quantization

Image Classification

CIFAR-10

Architectures

For network architectures, we currently support AlexNet, VGG11 (in 'CIFAR10'), ResNet-20/32/44/56/110 (in 'CIFAR-10'), and ResNet-18 (in 'CIFAR10_resnet18'). For AlexNet, VGG11, and ResNet-20/32/44/56/110, we quantize both weights and activations. For ResNet-18, we quantize activations.

Dataset

By default, the dataset is supposed to be in a 'data' folder at the same lavel of 'main.py'

Train Quantized ANNs

We progressively train full precision, 4, 3, and 2 bit ANN models.

An example to train AlexNet:

python main.py --arch alex --bit 32 --wd 5e-4
python main.py --arch alex --bit 4 --wd 1e-4  --lr 4e-2 --init result/alex_32bit/model_best.pth.tar
python main.py --arch alex --bit 3 --wd 1e-4  --lr 4e-2 --init result/alex_4bit/model_best.pth.tar
python main.py --arch alex --bit 2 --wd 3e-5  --lr 4e-2 --init result/alex_3bit/model_best.pth.tar

Evaluate Converted SNNs

The time steps of SNNs are automatically calculated from activation precision, i.e., T = 2^b-1. By default, we use signed IF neuron model.

optinal arguments:
    --u                    Use unsigned IF neuron model

Example: AlexNet(SNN) performance with traditional unsigned IF neuron model. An 3/2-bit ANN is converted to an SNN with T=3/7.

python snn.py --arch alex --bit 3 -e -u --init result/alex_3bit/model_best.pth.tar
python snn.py --arch alex --bit 2 -e -u --init result/alex_2bit/model_best.pth.tar

Example: AlexNet(SNN) performance with signed IF neuron model. An 3/2-bit ANN is converted to an SNN with T=3/7.

python snn.py --arch alex --bit 3 -e -u --init result/alex_3bit/model_best.pth.tar
python snn.py --arch alex --bit 2 -e -u --init result/alex_2bit/model_best.pth.tar

Fine-tune Converted SNNs

By default, we use signed IF neuron model during fine-tuning.

optinal arguments:
    --num_epochs / -n               Number of epochs to fine-tune at each layer
                                    default: 1
    --force                         Always update fine-tuned parameters without evaluation on training data

Example: finetune converted SNN models.

python snn_ft.py --arch alex --bit 2 --force --init result/alex_2bit/model_best.pth.tar
python snn_ft.py --arch resnet18 --bit 2 --force --init result/resnet18_2bit/model_best.pth.tar
python snn_ft.py --arch resnet56 --bit 2 -n 8 --init result/resnet56_2bit/model_best.pth.tar

Checkpoints for Quantized Models

Model3-bit2-bit
AlexNetalex_3bitalex_2bit
VGG11vgg11_3bitvgg11_2bit
ResNet20resnet20_3bitresnet20_2bit
ResNet44resnet44_3bitresnet44_2bit
ResNet56resnet56_3bitresnet56_2bit
ResNet18resnet18_3bitresnet18_2bit

ImageNet

We use distributed data parallel (DDP) for training. Please refer to Pytorch DDP for details.

To speed up data loading, we replace the vanilla Pytorch dataloader with nvidia-dali.

Nvidia-dali package

# for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100
# for CUDA 11
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110

For more details on nvidia-dali, please refer to NVIDIA's official document NVIDIA DALI Documentation

Architectures

For network architectures, we currently support AlexNet and VGG16.

Train Qantized ANNs

With full-precision pre-trained models from TorchVision, we progressively 4, 3, and 2 bit ANN models.

An example to train AlexNet:

python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 4 --workers 4 --lr=0.1 --epochs 60 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 3 --init result/alexnet_4bit/model_best.pth.tar --workers 4 --lr=0.01 --epochs 60 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 2 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --lr=0.01 --epochs 60 --dali_cpu /data/imagenet2012

Evaluate Converted SNNs

Example: AlexNet (SNN) performance with traditional unsigned IF neuron model. A 3/2-bit ANN is converted to an SNN with T=7/3.

python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e -u --bit 3 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e -u --bit 2 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Example: AlexNEt (SNN) performance with signed IF neuron model. A 3/2-bit ANN is converted to an SNN with T=7/3.

python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e --bit 3 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e --bit 2 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Finetune converted SNNs

By default, we use signed IF neuron model in fine-tuning.

Example:

python -m torch.distributed.launch --nproc_per_node=4 snn_ft.py -a alexnet -b 128 --bit 3 -n 8 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn_ft.py -a alexnet -b 128 --bit 2 -n 8 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012

Checkpoints for Quantized Models

Model3-bit2-bit
AlexNetalexnet_3bitalexnet_2bit
VGG16vgg16_3bitvgg16_2bit

Object Detection

We use yolov2-yolov3_PyTorch as the framework for object detection.

Preparation

About required packages and datasets, please refer to README in yolov2-yolov3_PyTorch for preparation. In the 'object detection' folder, we also prepare a merged README detailing everything.

Architecture

We currently support Tiny YOLO and YOLOv2 with a ResNet-34 backbone.

optinal arguments:
    --version / -v               Supported architecture
                                 available: yolov2_tiny, yolov2_r34

PASCAL VOC 2007

Train Quantized ANNs

Example: train Tiny YOLO with activations qunatized to 32/4/3/2 bits.

python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 32
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 3 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 2 --init CHECKPOINT_PATH

Evaluate Models

optinal arguments:
    --spike               Evaluate with spikes (as SNNs)

Example: evaluate Tiny YOLO (SNN) with T = 15, 7, 3

python eval.py -d voc --cuda -v yolov2_tiny --bit 4 --spike --init CHECKPOINT_PATH
python eval.py -d voc --cuda -v yolov2_tiny --bit 3 --spike --init CHECKPOINT_PATH
python eval.py -d voc --cuda -v yolov2_tiny --bit 2 --spike --init CHECKPOINT_PATH

Checkpoints for Quantized Models

Model4-bit3-bit2-bit
Tiny Yoloyolov2_tiny_4bityolov2_tiny_3bityolov2_tiny_2bit
YoloV2(ResNet-34)yolov2_r34_4bityolov2_r34_3bityolov2_r34_2bit

MS COCO 2017

Train Quantized ANNs

Example: train Tiny YOLO with activations qunatized to 32/4/3/2 bits.

python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 32 -ms --ema --sybn --batch_size 4 
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 4 -ms --ema --sybn --batch_size 4  --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 3 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 2 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH
Evaluate Models

Example: evaluate Tiny YOLO (SNN) with T = 15, 7, 3

python eval.py -d coco-val --cuda -v yolov2_tiny --bit 4 --spike --init CHECKPOINT_PATH
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 3 --spike --init CHECKPOINT_PATH
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 2 --spike --init CHECKPOINT_PATH 

Checkpoints for Quantized Models

Model4-bit3-bit2-bit
Tiny Yoloyolov2_tiny_4bityolov2_tiny_3bityolov2_tiny_2bit
YoloV2(ResNet-34)yolov2_r34_4bityolov2_r34_3bityolov2_r34_2bit

Semantic Segmentation

We use vedaseg, an open source semantic segmentation toolbox based on PyTorch, as the framework for semantic segmentation.

Preparation

About required packages and datasets, please refer to README in vedaseg for preparation. In the 'semantic segmentation' folder, we also prepare a merged README detailing everything.

Architecture

We currently support Deeplabv1 (VGG9) and Deeplabv3 (ResNet-34 + ASPP).

PASCAL VOC 2012

Train Quantized ANNs

Example: train VGG9 with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/voc_deeplabv1.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv1_2bit.py "0, 1, 2, 3" 

Example: train ResNet-34 + ASPP with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/voc_deeplabv3.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/voc_deeplabv3_2bit.py "0, 1, 2, 3" 

Evaluate Models

Example: evaluate VGG9 (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/voc_deeplabv1_T15.py './workdir/voc_deeplabv1_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv1_T7.py './workdir/voc_deeplabv1_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv1_T3.py './workdir/voc_deeplabv1_2bit/best_mIoU.pth' "0, 1, 2, 3" 

Example: evaluate ResNet-34 + ASPP (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/voc_deeplabv3_T15.py './workdir/voc_deeplabv3_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv3_T7.py './workdir/voc_deeplabv3_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/voc_deeplabv3_T3.py './workdir/voc_deeplabv3_2bit/best_mIoU.pth' "0, 1, 2, 3" 

Checkpoints for Quantized Models

Model4-bit3-bit2-bit
VGG-9voc_deeplabv1_4bitvoc_deeplabv1_3bitvoc_deeplabv1_2bit
ResNet-34 + ASPPvoc_deeplabv3_4bitvoc_deeplabv3_3bitvoc_deeplabv3_2bit

MS COCO 2017

Train Quantized ANNs

Example: train VGG9 with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/coco_deeplabv1.py "0, 1, 2, 3, 6, 7" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_4bit.py "0, 1, 2, 3, 6, 7" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv1_2bit.py "0, 1, 2, 3" 

Example: train ResNet-34 + ASPP with activations qunatized to 32/4/3/2 bits.

bash ./tools/dist_train.sh configs/coco_deeplabv3.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_4bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_3bit.py "0, 1, 2, 3" 
bash ./tools/dist_train.sh configs/coco_deeplabv3_2bit.py "0, 1, 2, 3" 

Evaluate Models

Example: evaluate VGG9 (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/coco_deeplabv1_T15.py './workdir/coco_deeplabv1_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv1_T7.py './workdir/coco_deeplabv1_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv1_T3.py './workdir/coco_deeplabv1_2bit/best_mIoU.pth' "0, 1, 2, 3" 

Example: evaluate ResNet-34 + ASPP (SNN) with T = 15, 7, 3

bash ./tools/dist_test.sh configs/coco_deeplabv3_T15.py './workdir/coco_deeplabv3_4bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv3_T7.py './workdir/coco_deeplabv3_3bit/best_mIoU.pth' "0, 1, 2, 3" 
bash ./tools/dist_test.sh configs/coco_deeplabv3_T3.py './workdir/coco_deeplabv3_2bit/best_mIoU.pth' "0, 1, 2, 3" 

Checkpoints for Quantized Models

Model4-bit3-bit2-bit
VGG-9coco_deeplabv1_4bitcoco_deeplabv1_3bitcoco_deeplabv1_2bit
ResNet-34 + ASPPcoco_deeplabv3_4bitcoco_deeplabv3_3bitcoco_deeplabv3_2bit