Home

Awesome

Structural Pruning via Latency-Saliency Knapsack

This repository is the official PyTorch implementation of NeurIPS 2022 paper Structural Pruning via Latency-Saliency Knapsack.

Useful links:

<img src="assets/pipeline.png">

License

Please check the LICENSE file. HALP may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please contact researchinquiries@nvidia.com.

Requirements

  1. Prepare Environment

    To run the code and reproduce the results, it is highly recommended to create the docker image using Dockerfile.

    Alternatively, please run the code with virtual environment with Python 3.6, and install the necessary packages:

    pip install torch==1.4.0
    pip install torchvision==0.5.0
    pip install numpy
    pip install Pillow
    pip install PyYAML
    pip install pandas
    

    Additionally install APEX library for FP16 support: Installing NVIDIA APEX

  2. Download Pretrained Models

    We provide the pretrained baseline models in Google Drive. Please download and put the pretrained models in the folder model_ckpt/.

  3. Download Latency LUT

    The latency lookup table is provided: ResNet50_on_TitanV

    Please download the latency lookup table file and put it under folder LUT/.

  4. Prepare Data

    Download the ImageNet1K and modify the data_root in config file to the correct path accordingly.

Running the code

Train a ResNet50 baseline

python multiproc.py --nproc_per_node 8 main.py --exp configs/exp_configs/rn50_imagenet_baseline.yaml --no_prune

Prune a ResNet50

python multiproc.py --nproc_per_node 8 main.py --exp configs/exp_configs/rn50_imagenet_prune.yaml --pretrained model_ckpt/resnet50_full.pth

Evaluate a pruned ResNet50 before removing the zero weights

python multiproc.py --nproc_per_node 8 main.py --pretrained model_ckpt/resnet50_halp55.pth --eval_only

Evaluate a pruned ResNet50 after removing the zero weights

python multiproc.py --nproc_per_node 8 main.py --pretrained model_ckpt/resnet50_halp55_clean.pth --mask model_ckpt/resnet50_halp55_group_mask.pkl --eval_only

Measure the actualy latency of a pruned model

python profile.py --model_path model_ckpt/resnet50_halp55_clean.pth --mask_path model_ckpt/resnet50_halp55_group_mask.pkl

Results on ImageNet

ModelFLOPsTop-1 AccTop-5 AccFPSCheckpoint
ResNet502.998G77.4493.741213RN50-HALP80
1.957G76.4793.111674RN50-HALP55
1.113G74.4191.852610RN50-HALP30

Citation

@inproceedings{shen2022structural,
    title={Structural Pruning via Latency-Saliency Knapsack},
    author={Shen, Maying and Yin, Hongxu and Molchanov, Pavlo and Mao, Lei and Liu, Jianna and Alvarez, Jose},
    booktitle={Advances in Neural Information Processing Systems},
    year={2022}
}