Home

Awesome

Quantization Variation

This repository contains the codes for reproducing our work: "Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision", published in Transactions on Machine Learning Research (TMLR).

Abstract

In this paper, we identify the difficulty of transformer low-bit quantization-aware training on its unique variation behaviors, which significantly differ from ConvNets. Based on comprehensive quantitative analysis, we observe variation in three hierarchies: various module quantization sensitivities, outliers in static weight and activation distribution, and oscillation in dynamic parameter fluctuations. These variations of transformers bring instability to the quantization-aware training (QAT) and negatively influence the performance. We explore the best practices to alleviate the variation's influence during low-bit transformer QAT and propose a variation-aware quantization scheme. We extensively verify and show our scheme can alleviate the variation and improve the performance of transformers across various models and tasks. Our solution substantially improves the 2-bit Swin-T, achieving a 3.35% accuracy improvement over previous state-of-the-art methods on ImageNet-1K.

<div align=center> <img width=80% src="variation.png"/> </div>

Citation

If you find our code useful for your research, please consider citing:

@article{
    huang2024quantization,
    title={Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision},
    author={Xijie Huang, Zhiqiang Shen, Pingcheng Dong, Kwang-Ting Cheng},
    journal={Transactions on Machine Learning Research},
    year={2024},
    url={https://openreview.net/forum?id=MHfoA0Qf6g}
}

Preparation

Requirements

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data and Soft Label

Run

Preparing for full-precision baseline model

Quantization-aware training

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_VVTQ.py \
--dist-url 'tcp://127.0.0.1:10001' \
--dist-backend 'nccl' \
--multiprocessing-distributed --world-size 1 --rank 0 \
--model deit_tiny_patch16_224_quant --batch-size 512 --lr 5e-4 \
--warmup-epochs 0 --min-lr 0 --wbits 4 --abits 4 --reg \
--softlabel_path ./FKD_soft_label_500_crops_marginal_smoothing_k_5 \
--finetune [path to full precision baseline model] \
--save_checkpoint_path ./DeiT-T-4bit --log ./log/DeiT-T-4bit.log\
--data [imagenet-folder with train and val folders]

Evaluation

CUDA_VISIBLE_DEVICES=0 python train_VVTQ.py \
--model deit_tiny_patch16_224_quant --batch-size 512 --wbits 4 --abits 4 \
--resume [path to W4A4 DeiT-T ckpt] --evaluate --log ./log/DeiT-T-W4A4.log \
--data [imagenet-folder with train and val folders]

Models

ModelW bitsA bitsaccuracy (Top-1)weightslogs
DeiT-T323273.75link-
DeiT-T4474.71linklink
DeiT-T3371.22linklink
DeiT-T2259.73linklink
SReT-T323275.81link-
SReT-T4476.99linklink
SReT-T3375.40linklink
SReT-T2267.53linklink
Swin-T323281.0link-
Swin-T4482.42linklink
Swin-T3381.37linklink
Swin-T2277.66linklink

Acknowledgement

This repo benefits from FKD and LSQuantization. Thanks for their wonderful works!

If you have any questions, feel free to contact Xijie Huang (xhuangbs AT connect.ust.hk, huangxijie1108 AT gmail.com)