Home

Awesome

llm-qat

This repository contains the training code of LLM-QAT introduced in our work: "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models"

In this work, we investigate quantization-aware training for LLMs (LLM-QAT). In addition to quantizing weights and activations, we also quantize the KV cache, which is critical for increasing throughput and support long sequence dependencies at current model sizes. We experiment with LLaMA models of sizes 7B, 13B, and 30B, at quantization levels down to 4-bits. We observe up to ~20 points improvement over training-free methods when quantizing weight, activations and kv cache to 4-bit, 8-bit and 4-bit, respectively.

<div align=center> <img width=80% src="./llm_qat_overview.jpg"/> </div>

Citation

If you find our code useful for your research, please consider citing:

@article{liu2023llm,
    title={LLM-QAT: Data-Free Quantization Aware Training for Large Language Models},
    author={Liu, Zechun and Oguz, Barlas and Zhao, Changsheng and Chang, Ernie and Stock, Pierre and Mehdad, Yashar and Shi, Yangyang and Krishnamoorthi, Raghuraman and Chandra, Vikas},
    journal={arXiv preprint arXiv:2305.17888},
    year={2023}
}

Run

1. Requirements:

2. Steps to run:

(1) Synthesize data:

(2) Quantization-aware training:

Quantized LLaMA-7B Models

The results reported in the paper is run with the internal LLaMA codebase in Meta. We reproduced our experiments with huggingface codebase and released code here. The results are close to those in the paper. For clearity, we list the zero-shot common sense reasoning accuracy of the opensourced version in the following table.

#bits (W-A-KV)boolqpiqasiqahellaswagwinograndearc_easyarc_challengeobqaavg.
4-8-472.476.947.670.565.867.544.450.462.0
4-8-873.677.448.573.068.868.445.553.463.6
4-6-1670.876.046.970.965.266.743.549.061.1
4-8-1672.977.947.972.968.069.144.855.663.6
4-16-1674.278.248.373.368.269.745.654.864.0
8-8-474.178.649.373.367.970.145.552.463.9
8-8-875.579.148.775.570.173.147.256.065.6
8-8-1675.779.148.975.870.472.847.856.365.9

Acknowledgement

This code is partially based on HuggingFace transformer repo.

Contact

Zechun Liu, Reality Labs, Meta Inc (zechunliu at meta dot com)

Barlas Oguz, Meta AI (barlaso at meta dot com)

Changsheng Zhao, Reality Labs, Meta Inc (cszhao at meta dot com)

Relevant Projects

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases [Paper] [Code]

SpinQuant: LLM Quantization with Learned Rotations [Paper] [Code]

License

BiT is CC-BY-NC 4.0 licensed as of now.