Awesome
π RoLoRA
<div align=center><img src="./assets/rolora_logo.png" width="80%"></div>This repository contains the code of RoLoRA introduced in our work: "RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization", published in EMNLP 2024.
π Abstract
In this work, we propose RoLoRA, the first LoRA-based scheme to apply rotation for outlier elimination, and then fine-tune rotated outlier-free LLMs for effective weight-activation quantization. RoLoRA can improve low-bit LoRA convergence and post-training quantization robustness in weight-activation quantization settings. RoLoRA is evaluated across various LLM series, tasks, and quantization settings, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantization of LLaMA2-13B on commonsense reasoning tasks compared to LoRA baseline.
<div align=center> <img width=80% src="./assets/rolora.png"/> </div>πΏ Citation
If you find our code useful for your research, please consider citing:
@article{huang2024rolora,
title={RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization},
author={Huang, Xijie and Liu, Zechun and Liu, Shih-Yang and Cheng, Kwang-Ting},
journal={arXiv preprint arXiv:2407.08044},
year={2024}
}
π οΈ Getting Started
Huggingface Hub Login
pip install --upgrade huggingface_hub
huggingface-cli login
Installation
pip install -r requirements.txt
If you encounter any problems installing fast_hadamard_transform
using pip, please consider building from source
π Finetuning
For experiments applying RoLoRA on LLaMA2-7B, please run
sh rolora.sh
Remove --rotate_down_proj
and --rotate_mode 'hadamard'
for LoRA baseline without rotation.
β Merging
To merge RoLoRA adapter to LLaMA2-7B, please run
sh merge_rolora.sh
Specify --adapter_name_or_path
and --export_dir
to be path of adapter files and export target folder. Remove --rotate_down_proj
and --rotate_mode 'hadamard'
for merging LoRA adapter without rotation.
π Evaluation
For evaluation on Zero-shot CommonSense Reasoning (ZCSR) and MMLU benchmarks, please run
sh eval_rolora.sh
Specify $NAME
, $WBITS
, and $ABITS
for the target quantization settings. Use --w_rtn
for RTN quantization on weights (default is GPTQ).
If you want evaluate the quantized models on more tasks, modify --task
to any tasks that are included in lm-evaluation-harness.
πΎ Checkpoint
We provide the checkpoints for the RoLoRA-finetuned LLMs in the given huggingface repo. The evaluation logs are also included.
π Results
Below is the results in LLaMA2-7B, LLaMA2-13B, and LLaMA3-8B on zero-shot commonsense reasoningοΌZCSRοΌand MMLU benchmarks.
#Bits | Quantizer | Method | LLaMA-2 7B | LLaMA-2 7B | LLaMA-2 13B | LLaMA-2 13B | LLaMA-3 8B | LLaMA-3 8B |
---|---|---|---|---|---|---|---|---|
ZCSR Avg. | MMLU Avg. | ZCSR Avg. | MMLU Avg. | ZCSR Avg. | MMLU Avg. | |||
FP16 | - | LoRA | 68.4 | 43.5 | 70.5 | 52.4 | 70.0 | 62.7 |
W4A4 | RTN | LoRA | 35.8 | 23.5 | 34.4 | 24.2 | 36.7 | 23.3 |
W4A4 | RTN | RoLoRA | 54.1 (β18.3) | 25.8 (β2.3) | 58.7 (β24.3) | 30.5 (β6.3) | 50.0 (β13.3) | 32.1 (β8.8) |
W4A4 | GPTQ | LoRA | 37.0 | 23.5 | 34.4 | 24.4 | 36.6 | 23.9 |
W4A4 | GPTQ | RoLoRA | 62.3 (β25.3) | 31.0 (β7.5) | 63.9 (β29.5) | 38.9 (β14.5) | 56.6 (β20.0) | 38.5 (β14.6) |
W6A6 | RTN | LoRA | 65.3 | 35.9 | 67.3 | 47.3 | 67.7 | 55.3 |
W6A6 | RTN | RoLoRA | 66.8 (β1.5) | 40.5 (β4.6) | 68.4 (β1.1) | 47.7 (β0.4) | 67.8 (β0.1) | 59.4 (β4.1) |
W6A6 | GPTQ | LoRA | 65.5 | 35.7 | 68.0 | 47.6 | 67.8 | 54.3 |
W6A6 | GPTQ | RoLoRA | 67.1 (β1.6) | 40.8 (β5.1) | 68.8 (β0.8) | 47.9 (β0.3) | 68.1 (β0.3) | 59.4 (β5.1) |
π Acknowledgement
This repo benefits from SpinQuant, QuaRot, LLaMa-Factory, and fast-hadamard-transform. Thanks for their wonderful works!
If you have any questions, feel free to contact Xijie Huang (xhuangbs AT connect.ust.hk, huangxijie1108 AT gmail.com)