Home

Awesome

Outlier Channel Splitting

OCS is a technique to improve post-training quantization which splits (i.e. duplicates then divides by two) channels containing large outlier weights in a layer. This reduces the dynamic range of the weights and reduces quantization error.

Citation

We published OCS in ICML 2019. See our paper here. Please use the citation below if you reference the work.

@article{zhao2019ocs,
  title="{Improving Neural Network Quantization without Retraining using Outlier Channel Splitting}",
  author={Zhao, Ritchie and Hu, Yuwei and Dotzel, Jordan and De Sa, Chris and Zhang, Zhiru},
  journal={International Conference on Machine Learning (ICML)},
  pages={7543--7552},
  month={June},
  year={2019}
}

Installation

We implement OCS in PyTorch using the Distiller library. The master branch contains the code to do weight OCS only, and scripts to help replicate the results in Table 1 from our paper.

Distiller requires Python-3.5 or Python-3.6. The required packages can be installed via:

pip install -r requirements.txt --user

Usage

Our experiment are run from the directory OCS-CNN. The main program is compress_classifier.py and we provide an example of what arguments to set below:

python compress_classifier.py \
    # Path to the ImageNet data
    %DATA_DIR% \
    # Model (--help shows a list of predefined models)
    -a resnet50 \
    # Batch size, data loaders, validation split
    -b 128 -j 1 --vs 0 \
    # Inference only, use pretrained model
    --evaluate --pretrained \
    # Activation and weight bitwidth
    --act-bits 8 --weight-bits 6 \
    # Use OCS method
    --quantize-method ocs \
    # Weight expand ratio in each layer
    --weight-expand-ratio 0.02 \
    # Weight clip threshold c
    #  c  >  0  --> clip to c*max(W)
    #  c ==  0  --> MSE clipping
    #  c == -1  --> ACIQ clipping
    #  c == -2  --> Entropy clipping
    --weight-clip-threshold 1.0 \
    # Activation clip threshold, see above
    --act-clip-threshold 1.0 \
    # Number of activation profiling batches
    # taken from the training set
    --profile-batches 4

This example is included as example.sh. One experiment over the ImageNet validation set takes about 15 minutes on a GTX 1080 Ti.

Batch Experiments

The directory OCS-CNN/scripts contains scripts to help replicate Table 1 from our paper. Use ocs_script.py to run many configurations in a loop, then parse_ocs.py <log_dir> to parse the results. Other scripts in the same directory run the clipping experiments.

Code Overview

The OCS quantization pass is inside distiller/quantization/. The main files are ocs.py, ocs_impl.py, and clip.py.

Distiller README

<center> <img src="imgs/banner1.png"></center>

License DOI

<div align="center"> <h3> <a href="https://github.com/NervanaSystems/distiller/wiki"> Wiki and tutorials </a> <span> | </span> <a href="https://nervanasystems.github.io/distiller/index.html"> Documentation </a> <span> | </span> <a href="#getting-started"> Getting Started </a> <span> | </span> <a href="https://nervanasystems.github.io/distiller/algo_pruning/index.html"> Algorithms </a> <span> | </span> <a href="https://nervanasystems.github.io/distiller/design/index.html"> Design </a> <span> | </span> <a href="https://nervanasystems.github.io/distiller/model_zoo/index.html"> Model Zoo </a> </h3> </div>

Distiller is an open-source Python package for neural network compression research.

Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.

Table of Contents

Feature set

Highlighted features:

Installation

These instructions will help get Distiller up and running on your local machine.

  1. Clone Distiller
  2. Create a Python virtual environment
  3. Install dependencies

Notes:

Clone Distiller

Clone the Distiller code repository from github:

$ git clone https://github.com/NervanaSystems/distiller.git

The rest of the documentation that follows, assumes that you have cloned your repository to a directory called distiller. <br>

Create a Python virtual environment

We recommend using a Python virtual environment, but that of course, is up to you. There's nothing special about using Distiller in a virtual environment, but we provide some instructions, for completeness.<br> Before creating the virtual environment, make sure you are located in directory distiller. After creating the environment, you should see a directory called distiller/env. <br>

Using virtualenv

If you don't have virtualenv installed, you can find the installation instructions here.

To create the environment, execute:

$ python3 -m virtualenv env

This creates a subdirectory named env where the python virtual environment is stored, and configures the current shell to use it as the default python environment.

Using venv

If you prefer to use venv, then begin by installing it:

$ sudo apt-get install python3-venv

Then create the environment:

$ python3 -m venv env

As with virtualenv, this creates a directory called distiller/env.<br>

Activate the environment

The environment activation and deactivation commands for venv and virtualenv are the same.<br> !NOTE: Make sure to activate the environment, before proceeding with the installation of the dependency packages:<br>

$ source env/bin/activate

Install dependencies

Finally, install Distiller's dependency packages using pip3:

$ pip3 install -r requirements.txt

PyTorch is included in the requirements.txt file, and will currently download PyTorch version 0.4.0 for CUDA 8.0. This is the setup we've used for testing Distiller.

Getting Started

You can jump head-first into some limited examples of network compression, to get a feeling for the library without too much investment on your part.

Distiller comes with a sample application for compressing image classification DNNs, compress_classifier.py located at distiller/examples/classifier_compression.

We'll show you how to use it for some simple use-cases, and will point you to some ready-to-go Jupyter notebooks.

For more details, there are some other resources you can refer to:

Example invocations of the sample application

Training-only

The following will invoke training-only (no compression) of a network named 'simplenet' on the CIFAR10 dataset. This is roughly based on TorchVision's sample Imagenet training application, so it should look familiar if you've used that application. In this example we don't invoke any compression mechanisms: we just train because for fine-tuning after pruning, training is an essential part.<br>
Note that the first time you execute this command, the CIFAR10 code will be downloaded to your machine, which may take a bit of time - please let the download process proceed to completion.

The path to the CIFAR10 dataset is arbitrary, but in our examples we place the datasets in the same directory level as distiller (i.e. ../../../data.cifar10).

First, change to the sample directory, then invoke the application:

$ cd distiller/examples/classifier_compression
$ python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p 30 -j=1 --lr=0.01

You can use a TensorBoard backend to view the training progress (in the diagram below we show a couple of training sessions with different LR values). For compression sessions, we've added tracing of activation and parameter sparsity levels, and regularization loss.

<center> <img src="imgs/simplenet_training.png"></center>

Getting parameter statistics of a sparsified model

We've included in the git repository a few checkpoints of a ResNet20 model that we've trained with 32-bit floats. Let's load the checkpoint of a model that we've trained with channel-wise Group Lasso regularization.<br> With the following command-line arguments, the sample application loads the model (--resume) and prints statistics about the model weights (--summary=sparsity). This is useful if you want to load a previously pruned model, to examine the weights sparsity statistics, for example. Note that when you resume a stored checkpoint, you still need to tell the application which network architecture the checkpoint uses (-a=resnet20_cifar):

$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_ch_regularized_dense.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=sparsity
<center> <img src="imgs/ch_sparsity_stats.png"></center>

You should see a text table detailing the various sparsities of the parameter tensors. The first column is the parameter name, followed by its shape, the number of non-zero elements (NNZ) in the dense model, and in the sparse model. The next set of columns show the column-wise, row-wise, channel-wise, kernel-wise, filter-wise and element-wise sparsities. <br> Wrapping it up are the standard-deviation, mean, and mean of absolute values of the elements.

In the Compression Insights notebook we use matplotlib to plot a bar chart of this summary, that indeed show non-impressive footprint compression.

<center> <img src="imgs/ch_sparsity_stats_barchart.png"></center>

Although the memory footprint compression is very low, this model actually saves 26.6% of the MACs compute.

$ python3 compress_classifier.py --resume=../ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar -a=resnet20_cifar ../../../data.cifar10 --summary=compute
<center> <img src="imgs/ch_compute_stats.png"></center>

8-bit quantization

This example performs 8-bit quantization of ResNet20 for CIFAR10. We've included in the git repository the checkpoint of a ResNet20 model that we've trained with 32-bit floats, so we'll take this model and quantize it:

$ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10 --resume ../examples/ssl/checkpoints/checkpoint_trained_dense.pth.tar --quantize --evaluate

The command-line above will save a checkpoint named quantized_checkpoint.pth.tar containing the quantized model parameters.

Explore the sample Jupyter notebooks

The set of notebooks that come with Distiller is described here, which also explains the steps to install the Jupyter notebook server.<br> After installing and running the server, take a look at the notebook covering pruning sensitivity analysis.

Sensitivity analysis is a long process and this notebook loads CSV files that are the output of several sessions of sensitivity analysis.

<center> <img src="imgs/resnet18-sensitivity.png"></center>

Set up the classification datasets

The sample application for compressing image classification DNNs, compress_classifier.py located at distiller/examples/classifier_compression, uses both CIFAR10 and ImageNet image datasets.<br>

The compress_classifier.py application will download the CIFAR10 automatically the first time you try to use it (thanks to TorchVision). The example invocations used throughout Distiller's documentation assume that you have downloaded the images to directory distiller/../data.cifar10, but you can place the images anywhere you want (you tell compress_classifier.py where the dataset is located - or where you want the application to download the dataset to - using a command-line parameter).

ImageNet needs to be downloaded manually, due to copyright issues. Facebook has created a set of scripts to help download and extract the dataset.

Again, the Distiller documentation assumes the following directory structure for the datasets, but this is just a suggestion:

distiller
  examples
    classifier_compression
data.imagenet/
    train/
    val/
data.cifar10/
    cifar-10-batches-py/
        batches.meta
        data_batch_1
        data_batch_2
        data_batch_3
        data_batch_4
        data_batch_5
        readme.html
        test_batch

Running the tests

We are currently light-weight on test and this is an area where contributions will be much appreciated.<br> There are two types of tests: system tests and unit-tests. To invoke the unit tests:

$ cd distiller/tests
$ pytest

We use CIFAR10 for the system tests, because its size makes for quicker tests. To invoke the system tests, you need to provide a path to the CIFAR10 dataset which you've already downloaded. Alternatively, you may invoke full_flow_tests.py without specifying the location of the CIFAR10 dataset and let the test download the dataset (for the first invocation only). Note that --cifar1o-path defaults to the current directory. <br> The system tests are not short, and are even longer if the test needs to download the dataset.

$ cd distiller/tests
$ python full_flow_tests.py --cifar10-path=<some_path>

The script exits with status 0 if all tests are successful, or status 1 otherwise.

Generating the HTML documentation site

Install mkdocs and the required packages by executing:

$ pip3 install -r doc-requirements.txt

To build the project documentation run:

$ cd distiller/docs-src
$ mkdocs build --clean

This will create a folder named 'site' which contains the documentation website. Open distiller/docs/site/index.html to view the documentation home page.

Built With

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details

Distiller Citation

If you used Distiller for your work, please use the following citation:

@misc{neta_zmora_2018_1297430,
  author       = {Neta Zmora and
                  Guy Jacob and
                  Gal Novik},
  title        = {Neural Network Distiller},
  month        = jun,
  year         = 2018,
  doi          = {10.5281/zenodo.1297430},
  url          = {https://doi.org/10.5281/zenodo.1297430}
}

Acknowledgments

Any published work is built on top of the work of many other people, and the credit belongs to too many people to list here.

Disclaimer

Distiller is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product. Additional algorithms and features are planned to be added to the library. Feedback and contributions from the open source and research communities are more than welcome.