Home

Awesome

Regularization-Pruning

This repository is for the new deep neural network pruning methods introduced in the following ICLR 2021 paper:

Neural Pruning via Growing Regularization [Camera Ready]
Huan Wang, Can Qin, Yulun Zhang, and Yun Fu
Northeastern University, Boston, MA, USA

TLDR: This paper introduces two new neural network pruning methods (named GReg-1 and GReg-2) based on uniformly growing (L2) regularization:

<center><img src="readme_figures/L1norm_vs_iter.png" width="700" hspace="10"></center>

Step 1: Set up environment

After the installlations, download the code:

git clone git@github.com:MingSun-Tse/Regularization-Pruning.git -b master

Step 2: Set up dataset

Setp 3: Set up pretrained (unpruned) models

# ResNet56, CIFAR10
CUDA_VISIBLE_DEVICES=0 python main.py --arch resnet56 --dataset cifar10 --method L1 --stage_pr [0,0,0,0,0] --batch_size 128 --wd 0.0005 --lr_ft 0:0.1,100:0.01,150:0.001 --epochs 200 --project scratch__resnet56__cifar10

# VGG19, CIFAR100
CUDA_VISIBLE_DEVICES=0 python main.py --arch vgg19 --dataset cifar100 --method L1 --stage_pr [0-18:0] --batch_size 256 --wd 0.0005 --lr_ft 0:0.1,100:0.01,150:0.001 --epochs 200 --project scratch__vgg19__cifar100

where --method indicates the pruning method; --stage_pr is used to indicate the layer-wise pruning ratio (since we train the unpruned model here, stage_pr is zero. pr is short for pruning_ratio); --lr_ft means learning rate schedule during finetuning.

Step 4: Training (pruning a pretrained model, not from scratch)

1. CIFAR10/100

(1) We use the following snippets to obtain the results on CIFAR10/100 (Table 2 in our paper).

# GReg-1
CUDA_VISIBLE_DEVICES=1 python main.py --method GReg-1 -a resnet56 --dataset cifar10 --wd 0.0005 --lr_ft 0:0.01,60:0.001,90:0.0001 --epochs 120 --base_model_path Experiments/*scratch__resnet56__cifar10*/weights/checkpoint_best.pth --batch_size_prune 128 --batch_size 128 --update_reg_interval 10 --stabilize 10000 --stage_pr [0,0.75,0.75,0.32,0] --project GReg-1__resnet56__cifar10__2.55x_pr0.750.32 --screen

# GReg-2
CUDA_VISIBLE_DEVICES=1 python main.py --method GReg-2 -a resnet56 --dataset cifar10 --wd 0.0005 --lr_ft 0:0.01,60:0.001,90:0.0001 --epochs 120 --base_model_path Experiments/*scratch__resnet56__cifar10*/weights/checkpoint_best.pth --batch_size_prune 128 --batch_size 128 --update_reg_interval 10 --stabilize 10000 --stage_pr [0,0.75,0.75,0.32,0] --project GReg-2__resnet56__cifar10__2.55x_pr0.750.32 --screen

# GReg-1
CUDA_VISIBLE_DEVICES=1 python main.py --method GReg-1 -a vgg19 --dataset cifar100 --wd 0.0005 --lr_ft 0:0.01,60:0.001,90:0.0001 --epochs 120 --base_model_path Experiments/*scratch__vgg19__cifar100*/weights/checkpoint_best.pth --batch_size_prune 256 --batch_size 256 --update_reg_interval 10 --stabilize 10000 --stage_pr [1-15:0.7] --project GReg-1__vgg19__cifar100__8.84x_pr0.7 --screen

# GReg-2
CUDA_VISIBLE_DEVICES=1 python main.py --method GReg-2 -a vgg19 --dataset cifar100 --wd 0.0005 --lr_ft 0:0.01,60:0.001,90:0.0001 --epochs 120 --base_model_path Experiments/*scratch__vgg19__cifar100*/weights/checkpoint_best.pth --batch_size_prune 256 --batch_size 256 --update_reg_interval 10 --stabilize 10000 --stage_pr [1-15:0.7] --project GReg-2__vgg19__cifar100__8.84x_pr0.7 --screen

Note: scratch__resnet56__cifar10and *scratch__vgg19__cifar100* are the experiments of training unpruned models in Step 3.

(2) For the results in Table 1, simply change the pruning ratio using --stage_pr:

2. ImageNet

We use the following snippets to obtain the results on ImageNet (Table 3 and 4 in our paper).

# GReg-1
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet34 --pretrained --method GReg-1 --screen --stage_pr [0,0.5,0.6,0.4,0,0] --skip_layers [1.0,2.0,2.3,3.0,3.5] --project GReg-1__resnet34__imagenet__1.32x_pr0.50.60.4

# GReg-2
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet34 --pretrained --method GReg-2 --screen --stage_pr [0,0.5,0.6,0.4,0,0] --skip_layers [1.0,2.0,2.3,3.0,3.5] --project GReg-2__resnet34__imagenet__1.32x_pr0.50.60.4
# GReg-1
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-1 --screen --stage_pr [0,0.3,0.3,0.3,0.14,0] --project GReg-1__resnet50__imagenet__1.49x_pr0.30.14

# GReg-2
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-2 --screen --stage_pr [0,0.3,0.3,0.3,0.14,0] --project GReg-2__resnet50__imagenet__1.49x_pr0.30.14
# GReg-1
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-1 --screen --stage_pr [0,0.6,0.6,0.6,0.21,0] --project GReg-1__resnet50__imagenet__2.31x_pr0.60.21

# GReg-2
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-2 --screen --stage_pr [0,0.6,0.6,0.6,0.21,0] --project GReg-2__resnet50__imagenet__2.31x_pr0.60.21
# GReg-1
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-1 --screen --stage_pr [0,0.74,0.74,0.6,0.21,0] --project GReg-1__resnet50__imagenet__2.56x_pr0.740.60.21

# GReg-2
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-2 --screen --stage_pr [0,0.74,0.74,0.6,0.21,0] --project GReg-2__resnet50__imagenet__2.56x_pr0.740.60.21
# GReg-1
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-1 --screen --stage_pr [0,0.68,0.68,0.68,0.5,0] --project GReg-1__resnet50__imagenet__3.06x_pr0.680.5

# GReg-2
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-2 --screen --stage_pr [0,0.68,0.68,0.68,0.5,0] --project GReg-2__resnet50__imagenet__3.06x_pr0.680.5

# GReg-1
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-1 --wg weight --screen --stage_pr [0,0.827,0.827,0.827,0.827,0.827] --project GReg-1__resnet50__imagenet__wgweight_pr0.827

# GReg-2
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method GReg-2 --wg weight --screen --stage_pr [0,0.827,0.827,0.827,0.827,0.827] --project GReg-2__resnet50__imagenet__wgweight_pr0.827

where --wg weight is to indicate the weight group is weight element, i.e., unstructured pruning.

ImageNet Results

Our pruned ImageNet models can be downloaded at this google drive. Comparison with other methods is shown below. Both structured pruning (filter pruning) and unstructured pruning are evaluated.

Tips to load our pruned model. The pruned model (both the pruned architecture and weights) is saved in the checkpoint_best.pth. When loading this file using torch.load(), the current path MUST be the root of this code repository (because it needs the model module in the current directory); otherwise, it will report an error.

(1) Acceleration (structured pruning) comparison on ImageNet

<center><img src="readme_figures/acceleration_comparison_imagenet.png" width="700" hspace="10"></center>

(2) Compression (unstructured pruning) comparison on ImageNet

<center><img src="readme_figures/compression_comparison_imagenet.png" width="700" hspace="10"></center>

Some useful features

This code also implements some baseline pruning methods that may help you:

CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method L1 --screen --stage_pr [0,0.68,0.68,0.68,0.5,0] --project L1__resnet50__imagenet__3.06x_pr0.680.5
CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method L1 --pick_pruned rand --screen --stage_pr [0,0.68,0.68,0.68,0.5,0] --project L1__resnet50__imagenet__3.06x_pr0.680.5__randpruning

CUDA_VISIBLE_DEVICES=0,1 python main.py --dataset imagenet --arch resnet50 --pretrained --method L1 --pick_pruned max --screen --stage_pr [0,0.68,0.68,0.68,0.5,0] --project L1__resnet50__imagenet__3.06x_pr0.680.5__maxpruning

Feel free to let us know (raise a GitHub issue or email to wang.huan@northeastern.edu. Email is more recommended if you'd like quicker reply) if you want any new feature or to evaluate the methods on networks other than those in the paper.

Acknowledgments

In this code we refer to the following implementations: pytorch imagenet example, rethinking-network-pruning, EigenDamage-Pytorch, pytorch_resnet_cifar10. Great thanks to them!

Reference

Please cite this in your publication if our work helps your research:

@inproceedings{wang2021neural,
  Author = {Wang, Huan and Qin, Can and Zhang, Yulun and Fu, Yun},
  Title = {Neural Pruning via Growing Regularization},
  Booktitle = {International Conference on Learning Representations (ICLR)},
  Year = {2021}
}