Awesome

Model Compression

Getting started

Prerequisites

This repository is implemented and verified on Anaconda virtual environment with python 3.7

Installation

Clone this repository.

$ git clone https://github.com/j-marple-dev/model_compression.git
$ cd model_compression

Create virtual environment

$ conda env create -f environment.yml 
$ conda activate model_compression

$ make install 
$ conda activate model_compression

(Optional for contributors) Install CI environment

$ conda activate model_compression
$ make dev

(Optional for nvidia gpu) Install cudatoolkit.

$ conda activate model_compression
$ conda install -c pytorch cudatooolkit=${cuda_version}

After environment setup, you can validate the code by the following commands.

$ make format  # for formatting
$ make test  # for linting

Docker

Clone this repository.

$ git clone https://github.com/j-marple-dev/model_compression.git
$ cd model_compression

Make sure you have installed Docker Engine and nvidia-docker.
Run the docker image.

$ docker run -it --gpus all --ipc=host -v $PWD:/app/model_compression jmarpledev/model_compression:latest /bin/bash
$ cd model_compression

Usages

Run training

Training the model. Trainer supports the following options:

Basic Settings: batch size, epoch numbers, seed
Stochastic Gradient Descent: momentum, weight decay, initial learning rate, nesterov momentum
Image Augmentation: Autoaugment, Randaugment, CutMix
Loss: Cross Entropy + Label Smoothing, Hinton Knowledge Distillation Loss
Learning Rate Scheduler: Cosine Annealing with Initial Warmups

$ python train.py --help
usage: train.py [-h] [--multi-gpu] [--gpu GPU] [--finetune FINETUNE]
                [--resume RESUME] [--half] [--wlog] [--config CONFIG]

Model trainer.

optional arguments:
  -h, --help           show this help message and exit
  --multi-gpu          Multi-GPU use
  --gpu GPU            GPU id to use
  --finetune FINETUNE  Model path to finetune (.pth.tar)
  --resume RESUME      Input log directory name to resume in save/checkpoint
  --half               Use half precision
  --wlog               Turns on wandb logging
  --config CONFIG      Configuration path (.py)

$ python train.py --config path_to_config.py  # basic run
$ python train.py --config path_to_config.py  --gpu 1 --resume checkpoint_dir_name # resume training on gpu 1

Configurations for training

Following options are available:

Basic Settings: BATCH_SIZE, EPOCHS, SEED, MODEL_NAME(src/models), MODEL_PARAMS, DATASET
Stochatic Gradient descent: MOMENTUM, WEIGHT_DECAY, LR
Image Augmentation: AUG_TRAIN(src/augmentation/policies.py), AUG_TRAIN_PARAMS, AUG_TEST(src/augmentation/policies.py), CUTMIX
Loss: CRITERION(src/criterions.py), CRITERION_PARAMS
Learning Rate Scheduler: LR_SCHEDULER(src/lr_schedulers.py), LR_SCHEDULER_PARAMS

# Example of train config(config/train/cifar/densenet_121.py)
import os

config = {
    "SEED": 777,
    "AUG_TRAIN": "randaugment_train_cifar100_224",
    "AUG_TRAIN_PARAMS": dict(n_select=2, level=None),
    "AUG_TEST": "simple_augment_test_cifar100_224",
    "CUTMIX": dict(beta=1, prob=0.5),
    "DATASET": "CIFAR100",
    "MODEL_NAME": "densenet",
    "MODEL_PARAMS": dict(
        num_classes=100,
        inplanes=24,
        growthRate=32,
        compressionRate=2,
        block_configs=(6, 12, 24, 16),
        small_input=False,
        efficient=False,
    ),
    "CRITERION": "CrossEntropy", # CrossEntropy, HintonKLD
    "CRITERION_PARAMS": dict(num_classes=100, label_smoothing=0.1),
    "LR_SCHEDULER": "WarmupCosineLR", # WarmupCosineLR, Identity, MultiStepLR
    "LR_SCHEDULER_PARAMS": dict(
        warmup_epochs=5, start_lr=1e-3, min_lr=1e-5, n_rewinding=1
    ),
    "BATCH_SIZE": 128,
    "LR": 0.1,
    "MOMENTUM": 0.9,
    "WEIGHT_DECAY": 1e-4,
    "NESTEROV": True,
    "EPOCHS": 300,
    "N_WORKERS": os.cpu_count(),
}

Run pruning

Pruning makes a model sparse. Pruner supports the following methods:

Unstructured Pruning

Structured (Channel-wise) Pruning

Network Sliming
Magnitude based channel-wise pruning
Slim-Magnitude channel-wise pruning (combination of above two methods)

Usually, unstructured pruning gives more sparsity, but it doesn't support shrinking.

$ python prune.py --help
usage: prune.py [-h] [--multi-gpu] [--gpu GPU] [--resume RESUME] [--wlog]
                [--config CONFIG]

Model pruner.

optional arguments:
  -h, --help       show this help message and exit
  --multi-gpu      Multi-GPU use
  --gpu GPU        GPU id to use
  --resume RESUME  Input checkpoint directory name
  --wlog           Turns on wandb logging
  --config CONFIG  Configuration path

usage: prune.py [-h] [--gpu GPU] [--resume RESUME] [--wlog] [--config CONFIG]

$ python prune.py --config path_to_config.py  # basic run
$ python prune.py --config path_to_config.py --multi-gpu --wlog  # run on multi-gpu with wandb logging

Configurations for pruning

Pruning configuration extends training configuration (recommended) with following options:

Basic Training Settings: TRAIN_CONFIG
Pruning Settings: N_PRUNING_ITER, PRUNE_METHOD(src/runner/pruner.py), PRUNE_PARAMS

# Example of prune config(config/prune/cifar100/densenet_small_l2mag.py)
from config.train.cifar100 import densenet_small

train_config = densenet_small.config
config = {
    "TRAIN_CONFIG": train_config,
    "N_PRUNING_ITER": 15,
    "PRUNE_METHOD": "Magnitude", # LotteryTicketHypothesis, Magnitude, NetworkSlimming, SlimMagnitude
    "PRUNE_PARAMS": dict(
        PRUNE_AMOUNT=0.2,  # it iteratively prunes 20% of the network parameters at the end of trainings
        NORM=2,
        STORE_PARAM_BEFORE=train_config["EPOCHS"],  # used for weight initialization at every pruning iteration
        TRAIN_START_FROM=0,  # training starts from this epoch
        PRUNE_AT_BEST=False,  # if True, it prunes parameters at the trained network which achieves the best accuracy
                              # otherwise, it prunes the network at the end of training
    ),
}

Run shrinking (Experimental)

Shrinking reshapes a pruned model and reduce its size.

$ python shrink.py --help
usage: shrink.py [-h] [--gpu GPU] [--checkpoint CHECKPOINT] [--config CONFIG]

Model shrinker.

optional arguments:
  -h, --help            show this help message and exit
  --gpu GPU             GPU id to use
  --checkpoint CHECKPOINT
                        input checkpoint path to quantize
  --config CONFIG       Pruning configuration path

$ python shrink.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar  # basic run

Important Notes:

Shrinker is now experimental. It only supports:

channel-wise prunned models
networks that consist of conv-bn-activation sequence
network blocks that has channel concatenation followed by skip connections (e.g. DenseNet)
networks that have only one last fully-connected layer

On the other hads, it doesn't support:

network blocks that has element-wise sum followed by skip connections (e.g. ResNet, MixNet)
networks that have multiple fully-connected layers
Quantization after shrinking

Run quantization

It conducts one of 8-bit quantization methods:

post-training static quantization
Quantization-Aware Training

$ python quantize.py --help
usage: quantize.py [-h] [--resume RESUME] [--wlog] [--config CONFIG]
                   [--checkpoint CHECKPOINT]

Model quantizer.

optional arguments:
  -h, --help            show this help message and exit
  --resume RESUME       Input log directory name to resume
  --wlog                Turns on wandb logging
  --static              Post-training static quantization
  --config CONFIG       Configuration path
  --checkpoint CHECKPOINT
                        Input checkpoint path to quantize

$ python quantize.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar  # basic qat run
$ python quantize.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar --static  # basic static quantization run

Experiment Results

WANDB Log

Unstructured Pruning (LTH vs Weight Rewinding vs LR Rewinding)

Structured Pruning (Slim vs L2Mag vs L2MagSlim)

Shrinking after Structured Pruning

Densenet (L=100, k=12) pruned by 19.66% (Slim & CIFAR100)

parameters

Accuracy: 80.37%
Parameters: 0.78M -> 0.51M
Model Size: 6.48Mb -> 4.14Mb

$ python shrink.py --config config/prune/cifar100/densenet_small_slim.py --checkpoint path_to_checkpoint.pth.tar

2020-08-26 13:50:38,442 - trainer.py:71 - INFO - Created a model densenet with 0.78M params
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:50:42,719 - shrinker.py:104 - INFO - Acc: 80.37, Size: 6.476016 MB, Sparsity: 19.66 %
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:50:45,781 - shrinker.py:118 - INFO - Acc: 80.37, Size: 4.141713 MB, Params: 0.51 M

Densenet (L=100, k=12) pruned by 35.57% (Network Slimming & CIFAR100)

parameters

Accuracy: 79.07%
Parameters: 0.78M -> 0.35M
Model Size: 6.48Mb -> 2.85Mb

$ python shrink.py --config config/prune/cifar100/densenet_small_slim.py --checkpoint path_to_checkpoint.pth.tar

2020-08-26 13:52:58,946 - trainer.py:71 - INFO - Created a model densenet with 0.78M params
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:53:03,100 - shrinker.py:104 - INFO - Acc: 79.07, Size: 6.476016 MB, Sparsity: 35.57 %
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:53:06,114 - shrinker.py:118 - INFO - Acc: 79.07, Size: 2.851149 MB, Params: 0.35 M

Quantization

Post-training Static Quantization

$ python quantize.py --config config/quantize/cifar100/densenet_small.py --checkpoint save/test/densenet_small/296_81_20.pth.tar --static --check-acc

2020-08-26 13:57:02,595 - trainer.py:71 - INFO - Created a model quant_densenet with 0.78M params
2020-08-26 13:57:05,275 - quantizer.py:87 - INFO - Acc: 81.2 %  Size: 3.286695 MB
2020-08-26 13:57:05,344 - quantizer.py:95 - INFO - Post Training Static Quantization: Run calibration
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:40 Time:  0:02:40
2020-08-26 13:59:47,555 - quantizer.py:117 - INFO - Acc: 81.03 %  Size: 0.974913 MB

Quantization-Aware Training

$ python quantize.py --config config/quantize/cifar100/densenet_small.py --checkpoint path_to_checkpoint.pth.tar --check-acc

2020-08-26 14:06:46,855 - trainer.py:71 - INFO - Created a model quant_densenet with 0.78M params
2020-08-26 14:06:49,506 - quantizer.py:87 - INFO - Acc: 81.2 %  Size: 3.286695 MB
2020-08-26 14:06:49,613 - quantizer.py:99 - INFO - Quantization Aware Training: Run training
2020-08-26 14:46:51,857 - trainer.py:209 - INFO - Epoch: [0 | 4]        train/lr: 0.0001        train/loss: 1.984219    test/loss: 1.436638     test/model_acc: 80.96%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:09 Time:  0:38:09
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:40 Time:  0:02:40
2020-08-26 15:27:43,919 - trainer.py:209 - INFO - Epoch: [1 | 4]        train/lr: 9e-05 train/loss: 1.989543    test/loss: 1.435748     test/model_acc: 80.87%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:10 Time:  0:38:10
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:36 Time:  0:02:36
2020-08-26 16:08:32,883 - trainer.py:209 - INFO - Epoch: [2 | 4]        train/lr: 6.5e-05       train/loss: 1.984149    test/loss: 1.436074     test/model_acc: 80.82%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:14 Time:  0:38:14
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:39 Time:  0:02:39
2020-08-26 16:49:28,848 - trainer.py:209 - INFO - Epoch: [3 | 4]        train/lr: 3.5e-05       train/loss: 1.984537    test/loss: 1.43442      test/model_acc: 81.01%    test/best_acc: 81.01%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:19 Time:  0:38:19
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:42 Time:  0:02:42
2020-08-26 17:30:32,187 - trainer.py:209 - INFO - Epoch: [4 | 4]        train/lr: 1e-05 train/loss: 1.990936    test/loss: 1.435393     test/model_acc: 80.92%    test/best_acc: 81.01%
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:37 Time:  0:02:37
2020-08-26 17:33:10,689 - quantizer.py:117 - INFO - Acc: 81.01 %        Size: 0.974913 MB

Class Diagram

References

Papers

Architecture / Training

Augmentation

Pruning

Knowledge Distillation

Quantization

Quantizing deep convolutional networks for efficient inference: A whitepaper

Implementations / Tutorials

Competition

Architecture / Training

Augmentation

Pruning

Knowledge Distillation

https://github.com/karanchahal/distiller

Quantization

Contributors

Thanks goes to these wonderful people (emoji key):

<table> <tr> <td align="center"><a href="https://github.com/Curt-Park"><img src="https://avatars3.githubusercontent.com/u/14961526?v=4?s=100" width="100px;" alt=""/> Jinwoo Park (Curt)</a> <a href="https://github.com/j-marple-dev/model_compression/commits?author=Curt-Park" title="Code">💻</a></td> <td align="center"><a href="https://github.com/Hoonyyhoon"><img src="https://avatars0.githubusercontent.com/u/25141842?v=4?s=100" width="100px;" alt=""/> Junghoon Kim</a> <a href="https://github.com/j-marple-dev/model_compression/commits?author=Hoonyyhoon" title="Code">💻</a></td> <td align="center"><a href="https://github.com/HSShin0"><img src="https://avatars0.githubusercontent.com/u/44793742?v=4?s=100" width="100px;" alt=""/> Hyungseok Shin</a> <a href="https://github.com/j-marple-dev/model_compression/commits?author=HSShin0" title="Code">💻</a></td> <td align="center"><a href="https://www.linkedin.com/in/juhee-lee-393342126/"><img src="https://avatars0.githubusercontent.com/u/18753708?v=4?s=100" width="100px;" alt=""/> Juhee Lee</a> <a href="https://github.com/j-marple-dev/model_compression/commits?author=Ingenjoy" title="Code">💻</a></td> <td align="center"><a href="https://limjk.ai"><img src="https://avatars.githubusercontent.com/u/10356193?v=4?s=100" width="100px;" alt=""/> Jongkuk Lim</a> <a href="https://github.com/j-marple-dev/model_compression/commits?author=JeiKeiLim" title="Code">💻</a></td> <td align="center"><a href="https://github.com/ulken94"><img src="https://avatars.githubusercontent.com/u/58245037?v=4?s=100" width="100px;" alt=""/> Haneol Kim</a> <a href="https://github.com/j-marple-dev/model_compression/commits?author=ulken94" title="Code">💻</a></td> </tr> </table>

This project follows the all-contributors specification. Contributions of any kind welcome!

Awesome

Model Compression

Contents

Getting started

Prerequisites

Installation

Docker

Usages

Run training

Configurations for training

Run pruning

Configurations for pruning

Run shrinking (Experimental)

Important Notes:

Run quantization

Experiment Results

Unstructured Pruning (LTH vs Weight Rewinding vs LR Rewinding)

Structured Pruning (Slim vs L2Mag vs L2MagSlim)

Shrinking after Structured Pruning

Densenet (L=100, k=12) pruned by 19.66% (Slim & CIFAR100)

Densenet (L=100, k=12) pruned by 35.57% (Network Slimming & CIFAR100)

Quantization

Post-training Static Quantization

Quantization-Aware Training

Class Diagram

References

Papers

Architecture / Training

Augmentation

Pruning

Knowledge Distillation

Quantization

Implementations / Tutorials

Competition

Architecture / Training

Augmentation

Pruning

Knowledge Distillation

Quantization

Contributors