Home

Awesome

Trained Ternary Quantization

pytorch implementation of Trained Ternary Quantization, a way of replacing full precision weights of a neural network by ternary values. I tested it on Tiny ImageNet dataset. The dataset consists of 64x64 images and has 200 classes.

The quantization roughly proceeds as follows.

  1. Train a model of your choice as usual (or take a trained model).
  2. Copy all full precision weights that you want to quantize. Then do the initial quantization:
    in the model replace them by ternary values {-1, 0, +1} using some heuristic.
  3. Repeat until convergence:
    • Make the forward pass with the quantized model.
    • Compute gradients for the quantized model.
    • Preprocess the gradients and apply them to the copy of full precision weights.
    • Requantize the model using the changed full precision weights.
  4. Throw away the copy of full precision weights and use the quantized model.

Results

I believe that this results can be made better by spending more time on hyperparameter optimization.

modelaccuracy, %top5 accuracy, %number of parameters
DenseNet-12174917151176
TTQ DenseNet-1216687~7M 2-bit, 88% are zeros
small DenseNet4975440264
TTQ small DenseNet4067~0.4M 2-bit, 38% are zeros
SqueezeNet5277827784
TTQ SqueezeNet3663~0.8M 2-bit, 66% are zeros

Implementation details

How to reproduce results

For example, for small DenseNet:

  1. Download Tiny ImageNet dataset and extract it to ~/data folder.
  2. Run python utils/move_tiny_imagenet_data.py to prepare the data.
  3. Go to vanilla_densenet_small/. Run train.ipynb to train the model as usual.
    Or you can skip this step and use model.pytorch_state (the model already trained by me).
  4. Go to ttq_densenet_small/.
  5. Run train.ipynb to do TTQ.
  6. Run test_and_explore.ipynb to explore the quantized model.

To use this on your data you need to edit utils/input_pipeline.py and to change the model architecture in files like densenet.py and get_densenet.py as you like.

Requirements