Home

Awesome

Efficient-CNN-Depth-Compression

Official PyTorch implementation of "Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming", published at ICML'23 (Blog post at this link).

image samples

Abstract Recent works on neural network pruning advocate that reducing the depth of the network is more effective in reducing run-time memory usage and accelerating inference latency than reducing the width of the network through channel pruning. In this regard, some recent works propose depth compression algorithms that merge convolution layers. However, the existing algorithms have a constricted search space and rely on human-engineered heuristics. In this paper, we propose a novel depth compression algorithm which targets general convolution operations. We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. Since the proposed subset selection problem is NP-hard, we formulate a surrogate optimization problem that can be solved exactly via two-stage dynamic programming within a few seconds. We evaluate our methods and baselines by TensorRT for a fair inference latency comparison. Our method outperforms the baseline method with higher accuracy and faster inference speed in MobileNetV2 on the ImageNet dataset. Specifically, we achieve 1.41× speed-up with 0.11%p accuracy gain in MobileNetV2-1.0 on the ImageNet.

Requirements

  1. Create conda environment and install necessary packages with
    conda env create -f asset/icml23.yml
    conda activate icml23
    pip install -r asset/requirements.txt
    
  2. If you further want to measure the inference time with TensorRT, install TensorRT with
    pip install nvidia-tensorrt==8.4.3.1
    
    Then, download torch_tensorrt wheel from this Link and install it by executing the below command at the directory you downloaded the file.
    pip install torch_tensorrt-1.2.0-cp37-cp37m-linux_x86_64.whl
    

Results on ImageNet

Accuracy and latency speed-up (measured on RTX2080 Ti) of compressed architectures. It is worth noting that we finetune the network after fixing the $A$ and $S$, and then merge the network at the test time.

Downloading and Evaluating the Checkpoints

  1. <a id="1"></a>Download the related checkpoints from below links and unzip the files at the root.

    MethodNetworksFinetuneCheckpoints
    PretrainedMobileNetV2-(1.0/1.4), VGG19download
    OursMobileNetV2-1.0180 epochsdownload
    MobileNetV2-1.4180 epochsdownload
    MobileNetV2-(1.0/1.4)180 epochs <br/> w/ knowledge distillationdownload
    VGG1920 epochsdownload
  2. <a id="2"></a>You can evaluate the accuracy and the inference time of the networks by below commands.

    • Evaluating the accuracy of the pretrained MobileNetV2-1.0 <br />(pretrained/mobilenetv2_100_ra-b33bc2c4.pth)

      python exps/main.py -a mobilenet_v2 --width-mult 1.0 -d {$IMAGENET_DIR} -m eval -c pretrained/ -f mobilenetv2_100_ra-b33bc2c4.pth
      
    • Measuring the inference time of the pretrained MobileNetV2-1.0 <br />(pretrained/mobilenetv2_100_ra-b33bc2c4.pth)

      python exps/inference_trt.py -a mobilenet_v2 --width-mult 1.0 -c pretrained/ -f mobilenetv2_100_ra-b33bc2c4.pth --nclass 1000 --trt False
      
    • Evaluating the accuracy of the compressed MobileNetV2-1.0 <br />(kd_exps/mb_v2_w1.0/tl25.0_dt0.3/checkpoint_ft_lr0.05_merged.pth)

      python exps/main.py -a learn_mobilenet_v2 --width-mult 1.0 -d {$IMAGENET_DIR} -m eval -c kd_exps/mb_v2_w1.0/tl25.0_dt0.3/ -f checkpoint_ft_lr0.05_merged.pth
      
    • Measuring the inference time of the compressed MobileNetV2-1.0 <br />(kd_exps/mb_v2_w1.0/tl25.0_dt0.3/checkpoint_ft_lr0.05_merged.pth)

      python exps/inference_trt.py -a learn_mobilenet_v2 --width-mult 1.0 -c kd_exps/mb_v2_w1.0/tl25.0_dt0.3/ -f checkpoint_ft_lr0.05_merged.pth --nclass 1000 --trt False
      
    • You can further obtain results with other configurations by changing -a, --width-mult option as the below (and also adjusting -c, -f option to the correct path).

      • MobileNetV2-1.0, vanilla : -a mobilenet_v2 --width-mult 1.0
      • MobileNetV2-1.0, compressed : -a learn_mobilenet_v2 --width-mult 1.0
      • MobileNetV2-1.4, vanilla : -a mobilenet_v2 --width-mult 1.4
      • MobileNetV2-1.4, compressed : -a learn_mobilenet_v2 --width-mult 1.4
      • VGG19, vanilla : -a vgg19
      • VGG19, compressed : -a learn_vgg19
    • If you want to measure inference time in TensorRT, use --trt True option in exps/inference_trt.py.

  3. Details on the Checkpoints

    • For the pretrained networks in MobileNetV2, we bring the weights from timm and rename the keys. This is the same pretrained weight used in the baseline work (DepthShrinker).
    • For the pretrained networks in VGG19, we bring the weights from torchvision and rename the keys.
    • For the compressed networks, we provide the weights of both finetuned and merged networks in MobileNetV2, and provide the weights of merged networks in VGG19.
    • It is worth noting that we finetune the network after fixing the $A$ and $S$, and then merge the network at the test time. Checkpoints that end with merged denotes the weights of the merged networks.

Solving DP from the Table

Path to the DP tables

Here, we provide the tables necessary to obtain $A$ and $S$ (ImageNet dataset). Precisely, we provide

After you unzip the importance tables, you will have necessary files as follows:

TableNetworksPath
$T_{\text{opt}}$MBV2-1.0utils/table/mbv2_1.0/opt_time_fish_gpu1_1228.csv
MBV2-1.4utils/table/mbv2_1.4/opt_time_fish_gpu1_0103.csv
VGG19utils/table/vgg19_no_trt/opt_time_fish_gpu1_0317.csv
$I_{\text{opt}}$MBV2-1.0exp_result/dp_imp/mb_v2_w1.0_ie1_ild_cos_ex/ext_importance_s_val_acc_n_single_a_1.6.csv
MBV2-1.4exp_result/dp_imp/mb_v2_w1.4_ie1_ild_cos_ex/ext_importance_s_val_acc_n_single_a_1.2.csv
VGG19exp_result/dp_imp/vgg19_ie1_ild_cos/ext_importance_s_val_acc_n_single_a_1.4.csv

Obtaining the optimal sets (A and S)

To obtain the optimal sets ($A$ and $S$) for different neural networks, execute the following commands. Make sure to specify the time budget ($T_0$) by using the --time-limit option in the command.

After it completes, you can find the results in the checkpoint.pth file, which contains a dictionary with keys act_pos and merge_pos, corresponding to the set $A$ and $S$.

Finetuning from the Optimal Sets

Once you have acquired the optimal sets ($A$ and $S$), finetune the network from the pretrained weight after fixing the activation layer and padding following the optimal sets.

If you haven't followed the previous steps, you can download the optimal sets and pretrained weights using the links below:

We provide the examples of finetuning commands for each network in the below.

After it completes, you can find the finetuned weights in the checkpoint_ft_lr{$LR}.pth file.

Merging the Finetuned Network

Once you finetune the network, merge the network from the finetuned weights. If you haven't followed the previous steps, you can download the finetuned weights using the links in this bullet.

We provide the examples of merging commands for each network in the below (you might need to adjust -c and -f option to the proper path if you merge the downloaded checkpoints).

After it completes, you can find the merged weights in the checkpoint_ft_lr{$LR}_merged.pth file.

You can evaluate the accuracy and measure inference time using the commands in this bullet. Make sure to adjust -c and -f option to a proper path. To illustrate, following commands can be used to evaluate each merged network.

Citation

@inproceedings{kim2023efficient,
      title={Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming}, 
      author={Kim, Jinuk and Jeong, Yeonwoo and Lee, Deokjae and Song, Hyun Oh},
      booktitle = {International Conference on Machine Learning (ICML)},
      year={2023}
}