Home

Awesome

Nonuniform-to-Uniform Quantization

This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation"

In this study, we propose a quantization method that can learn the non-uniform input thresholds to maintain the strong representation ability of nonuniform methods, while output uniform quantized levels to be hardware-friendly and efficient as the uniform quantization for model inference.

<div align=center> <img width=60% src="https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization/blob/main/U2UQ_github.jpg"/> </div>

To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters.

The formula for N2UQ is simply as follows,

Forward pass:

<div align=center> <img width=40% src="https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization/blob/main/Formula01.jpg"/> </div>

Backward pass:

<div align=center> <img width=40% src="https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization/blob/main/Formula02.jpg"/> </div>

Moreover, we proposed L1 norm based entropy preserving weight regularization for weight quantization.

Citation

If you find our code useful for your research, please consider citing:

@inproceedings{liu2022nonuniform,
  title={Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation},
  author={Liu, Zechun and Cheng, Kwang-Ting and Huang, Dong and Xing, Eric and Shen, Zhiqiang},
  journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Run

1. Requirements:

2. Data:

3. Pretrained Models:

4. Steps to run:

(1) For ResNet architectures:

(2) For MobileNet architectures:

Models

1. ResNet

NetworkMethodsW2/A2W3/A3W4/A4
ResNet-18
PACT64.468.169.2
DoReFa-Net64.767.568.1
LSQ67.670.271.1
N2UQ69.4 Model-Res18-2bit71.9 Model-Res18-3bit72.9 Model-Res18-4bit
N2UQ *69.7 Model-Res18-2bit72.1 Model-Res18-3bit73.1 Model-Res18-4bit
ResNet-34
LSQ71.673.474.1
N2UQ73.3 Model-Res34-2bit75.2 Model-Res34-3bit76.0 Model-Res34-4bit
N2UQ *73.4 Model-Res34-2bit75.3 Model-Res34-3bit76.1 Model-Res34-4bit
ResNet-50
PACT64.468.169.2
LSQ67.670.271.1
N2UQ75.8 Model-Res50-2bit77.5 Model-Res50-3bit78.0 Model-Res50-4bit
N2UQ *76.4 Model-Res50-2bit77.6 Model-Res50-3bit78.0 Model-Res50-4bit
<!-- | LQ-Nets | 64.9 | 68.2 | 69.3 | -->

Note that N2UQ without * denotes quantizing all the convolutional layers except the first input convolutional layer.

N2UQ with * denotes quantizing all the convolutional layers except the first input convolutional layer and three downsampling layers.

W2/A2, W3/A3, W4/A4 denote the cases where the weights and activations are both quantized to 2 bits, 3 bits, and 4 bits, respectively.

2. MobileNet

NetworkMethodsW4/A4
MobileNet-V2N2UQ72.1 Model-MBV2-4bit

Contact

Zechun Liu, HKUST (zliubq at connect.ust.hk)