

Code for "How Well Do Sparse ImageNet Models Transfer?"

This code allows replicating the transfer learning experiments, for both linear and full finetuning, of the CVPR 2022 paper How Well Do Sparse ImageNet Models Transfer?. The arXiv version of our paper can be found here.

Part of our implementation is based on the open-source code accompanying the NeurIPS 2021 paper "AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks", which can be found here. The dataset loading functionalities were adapted from the code for the NeurIPS 2020 paper "Do Adversarially Robust ImageNet Models Transfer Better?" (Salman, Ilyas et al), available here.


For our main transfer learning experiments, we use 12 benchmark datasets. Please see our paper for the complete list of these datasets, along with the corresponding citations. The instructions for downloading the relevant datasets can be found here.

ImageNet Checkpoints

Some of the sparse ImageNet checkpoints we have tested are publicly available (AC/DC, RigL, STR). Others we have trained ourselves; for example GMP and STR without label smoothing (following the official implementation). Finally, the WoodFisher and LTH-T checkpoints were provided by the authors, upon request.

For MobileNetV1 models we additionally used M-FAC checkpoints, which are also publicly available here.

All checkpoints, whether created by us or by others, except for LTH-T (which are not public), have been converted to a common format for importing convenience. The converted checkpoints can be found at the following links:

Transfer Learning Results

Raw results

The raw numerical results presented in the paper, for both linear and full finetuning, using ResNet50 and MobileNetV1 architectures, can be found in the raw_results folder.

Downstream Checkpoints

All checkpoints can be found here.

We make available one checkpoint for each transfer method/pruning method/sparsity/dataset.

The Resnet50 downstream checkpoints are available at the following links:

The MobileNetV1 downstream checkpoints are available at the following links:

To reproduce our results:

We recommend access to at least one GPU for each experiment (as the batch sizes are small, we were able to train even the larger networks on just one GPU). We recommend Pytorch 1.8, and, if possible, Weights & Biases (Wandb). If using Wandb is not possible, code should be run with the --use_wandb flag disabled. For convenience, we provide shell scripts to execute the most common experiments. Below are example commands to run resnet50 experiments. (the first argument is the GPU number(s), the second is the names of the datasets, and the third is the paths to these datasets on disk):

Important Please keep in mind that for full finetuning, our experiments on the Aircraft and Cars datasets use a higher learning rate than for all others. For this reason, we provide full finetuning scripts, with two different learning rates. Likewise, there are some special settings for training RigL models. First, the original RigL models were trained with images resized using bicubic interpolation (default in TensorFlow), while all other upstream checkpoints were trained using images resized with bilinear interpolation. In addition, the layer names are different between RigL and all other models, and so the configuration file that specifies which layers to load is also different. While these modifications apply to both full and linear finetuning, they must be specified only in the full finetuning runs; in the linear finetuning runs it is handled automatically for any upstream checkpoint with "rigl" in the name. Sample scripts that handle this correctly are provided for the full finetuning case.

./run_dataset_generalize_full_training_lr01.sh 1 cars /path/to/cars/data

./run_dataset_generalize_full_training_lr001.sh 1 cifar10 /path/to/cifar10/data

./run_dataset_generalize_full_training_rigl_lr001.sh 1 cifar10 /path/to/cifar10/data

./run_dataset_generalize_preextracted.sh 1 cars /path/to/cars/data

To run linear finetuning experiments using the DeepSparse inference engine:

./run_dataset_generalize_linear_finetuning_deepsparse.sh DSET /PATH/TO/DSET

./run_dataset_generalize_linear_finetuning_deepsparse_rigl.sh DSET /PATH/TO/DSET

Structure of the repository



If you found this repository useful, please consider citing our work.

  title={How Well Do Sparse Imagenet Models Transfer?},
  author={Iofinova, Eugenia and Peste, Alexandra and Kurtz, Mark and Alistarh, Dan},
  journal={arXiv preprint arXiv:2111.13445},