Home

Awesome

RPTQ: Reorder-Based Post-Training Quantization for Large Language Models

Large-scale language models (LLMs) have shown exceptional performance on various tasks. However, the deployment of LLMs is challenging due to their enormous size. One of the main challenges in quantizing LLMs is the different ranges between the channels, which affects the accuracy and compression ratio of the quantized model. In our paper, we propose a novel reorder-based quantization approach called RPTQ. The RPTQ approach involves rearranging the channels in the activations and then quantizing them in clusters, thereby reducing the impact of the range difference between channels. By implementing the RPTQ approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.

Overview

Update

Requirements

python packages

Usage

The RPTQ approach can be applied to OPT models.

python main.py opt-1.3b --wbits 4 --abits 4 --eval_ppl --tasks lambada_openai,piqa,arc_easy,arc_challenge,openbookqa,boolq

Only quantize K/V cache:

python main.py opt-1.3b --wbits 4 --abits 4 --only_quant_kv --eval_ppl --tasks lambada_openai,piqa,arc_easy,arc_challenge,openbookqa,boolq

To quantize larger network please use --multigpu:

python main.py opt-66b --wbits 4 --abits 4 --only_quant_kv --eval_ppl --tasks lambada_openai,piqa,arc_easy,arc_challenge,openbookqa,boolq --multigpu

Results

Perplexity

ModelOPT-1.3bOPT-6.7bOPT-13bOPT-30bOPT-66bOPT-175b
TaskWIKIPTC4WIKIPTC4WIKIPTC4WIKIPTC4WIKIPTC4WIKIPTC4
FP1614.6316.9614.7210.8613.0911.7410.1312.3411.209.5611.8410.699.3411.3610.288.3412.0110.13
W4A1614.7817.2114.9211.1813.6212.0710.2912.4511.279.5511.9110.749.3011.4210.318.3712.3110.26
W4A815.3917.7915.4811.2113.7412.1110.9013.4011.6210.2212.4111.019.4611.7310.578.4312.2410.49
W4A416.8819.2316.5512.0015.1712.8512.7415.7614.7111.1514.1113.4812.2318.8715.9310.6015.5912.28
W4A4KV15.2617.6515.3711.2613.4412.0310.5912.8011.549.9912.1811.019.7511.6410.618.4012.3810.54
W4A3KV17.2219.9416.9211.9214.1312.6111.1513.9012.0411.6214.9511.9610.8814.6911.369.3913.4511.27
W3A3KV18.4521.3318.2612.4214.4813.1311.4714.0812.4111.7614.9812.2211.4715.0311.7510.0313.8211.30

Zero-shot tasks

Tasklambada_openaipiqa
Model1.3b6.7b13b30b66b1.3b6.7b13b30b66b
FP1657.98%61.84%68.60%71.41%67.14%72.47%74.53%76.87%78.01%78.12%
W4A1657.46%60.78%68.50%71.37%67.06%71.59%74.80%76.93%78.29%78.18%
W4A852.39%67.35%62.44%64.99%67.02%69.69%75.89%75.46%76.93%77.52%
W4A449.34%64.93%60.23%63.92%68.50%68.66%75.40%73.55%76.16%77.14%
W4A4KV52.90%67.39%62.77%64.89%69.99%69.26%76.00%74.42%76.65%76.98%
W4A3KV47.02%64.97%61.05%59.20%66.23%68.22%75.73%73.23%67.46%74.21%
W3A3KV42.84%64.11%60.02%58.33%65.28%68.22%74.64%74.10%67.51%75.13%
Taskarc_easyarc_challenge
Model1.3b6.7b13b30b66b1.3b6.7b13b30b66b
FP1651.05%58.03%61.91%65.31%64.68%29.69%33.61%35.66%38.05%38.99%
W4A1651.17%57.02%61.82%65.10%64.89%30.03%32.59%35.49%37.96%38.99%
W4A848.35%60.18%60.94%63.46%64.60%26.36%34.04%35.58%37.45%38.82%
W4A447.55%56.90%58.41%62.12%63.76%25.85%34.30%33.95%36.17%37.20%
W4A4KV47.76%57.74%58.54%63.59%63.67%27.64%33.95%34.21%37.37%37.71%
W4A3KV46.29%56.69%56.10%48.44%59.00%26.02%33.95%33.95%30.71%36.77%
W3A3KV44.02%55.59%53.74%50.42%57.65%26.53%32.16%32.50%30.71%34.98%
Taskopenbookqaboolq
Model1.3b6.7b13b30b66b1.3b6.7b13b30b66b
FP1633.00%38.00%39.00%40.20%41.60%57.73%67.03%65.90%70.45%70.85%
W4A1631.80%37.40%39.20%40.60%42.00%58.99%59.72%66.66%70.70%70.55%
W4A832.40%38.00%38.60%39.40%41.80%46.88%65.93%66.57%70.64%71.07%
W4A432.60%38.40%38.00%38.60%42.00%41.37%65.44%58.47%67.70%70.24%
W4A4KV32.60%38.40%38.00%39.80%41.60%43.33%62.11%62.47%68.22%70.79%
W4A3KV32.80%36.80%37.00%34.00%39.40%42.84%61.31%57.76%61.74%67.06%
W3A3KV28.40%35.20%37.20%32.40%38.60%46.23%60.79%65.07%63.08%67.49%

Citation

If you use our RPTQ approach in your research, please cite our paper:

@misc{yuan2023rptq,
      title={RPTQ: Reorder-based Post-training Quantization for Large Language Models}, 
      author={Zhihang Yuan and Lin Niu and Jiawei Liu and Wenyu Liu and Xinggang Wang and Yuzhang Shang and Guangyu Sun and Qiang Wu and Jiaxiang Wu and Bingzhe Wu},
      year={2023},
      eprint={2304.01089},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}