Home

Awesome

πŸ”„ RoLoRA

<div align=center><img src="./assets/rolora_logo.png" width="80%"></div>

This repository contains the code of RoLoRA introduced in our work: "RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization", published in EMNLP 2024.

🌟 Abstract

In this work, we propose RoLoRA, the first LoRA-based scheme to apply rotation for outlier elimination, and then fine-tune rotated outlier-free LLMs for effective weight-activation quantization. RoLoRA can improve low-bit LoRA convergence and post-training quantization robustness in weight-activation quantization settings. RoLoRA is evaluated across various LLM series, tasks, and quantization settings, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantization of LLaMA2-13B on commonsense reasoning tasks compared to LoRA baseline.

<div align=center> <img width=80% src="./assets/rolora.png"/> </div>

🌿 Citation

If you find our code useful for your research, please consider citing:

@article{huang2024rolora,
  title={RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization},
  author={Huang, Xijie and Liu, Zechun and Liu, Shih-Yang and Cheng, Kwang-Ting},
  journal={arXiv preprint arXiv:2407.08044},
  year={2024}
}

πŸ› οΈ Getting Started

Huggingface Hub Login

pip install --upgrade huggingface_hub
huggingface-cli login

Installation

pip install -r requirements.txt

If you encounter any problems installing fast_hadamard_transform using pip, please consider building from source

πŸš‚ Finetuning

For experiments applying RoLoRA on LLaMA2-7B, please run

sh rolora.sh

Remove --rotate_down_proj and --rotate_mode 'hadamard' for LoRA baseline without rotation.

βŒ› Merging

To merge RoLoRA adapter to LLaMA2-7B, please run

sh merge_rolora.sh

Specify --adapter_name_or_path and --export_dir to be path of adapter files and export target folder. Remove --rotate_down_proj and --rotate_mode 'hadamard' for merging LoRA adapter without rotation.

πŸ” Evaluation

For evaluation on Zero-shot CommonSense Reasoning (ZCSR) and MMLU benchmarks, please run

sh eval_rolora.sh

Specify $NAME, $WBITS, and $ABITS for the target quantization settings. Use --w_rtn for RTN quantization on weights (default is GPTQ). If you want evaluate the quantized models on more tasks, modify --task to any tasks that are included in lm-evaluation-harness.

πŸ’Ύ Checkpoint

We provide the checkpoints for the RoLoRA-finetuned LLMs in the given huggingface repo. The evaluation logs are also included.

πŸ“š Results

Below is the results in LLaMA2-7B, LLaMA2-13B, and LLaMA3-8B on zero-shot commonsense reasoning(ZCSRοΌ‰and MMLU benchmarks.

#BitsQuantizerMethodLLaMA-2 7BLLaMA-2 7BLLaMA-2 13BLLaMA-2 13BLLaMA-3 8BLLaMA-3 8B
ZCSR Avg.MMLU Avg.ZCSR Avg.MMLU Avg.ZCSR Avg.MMLU Avg.
FP16-LoRA68.443.570.552.470.062.7
W4A4RTNLoRA35.823.534.424.236.723.3
W4A4RTNRoLoRA54.1 (↑18.3)25.8 (↑2.3)58.7 (↑24.3)30.5 (↑6.3)50.0 (↑13.3)32.1 (↑8.8)
W4A4GPTQLoRA37.023.534.424.436.623.9
W4A4GPTQRoLoRA62.3 (↑25.3)31.0 (↑7.5)63.9 (↑29.5)38.9 (↑14.5)56.6 (↑20.0)38.5 (↑14.6)
W6A6RTNLoRA65.335.967.347.367.755.3
W6A6RTNRoLoRA66.8 (↑1.5)40.5 (↑4.6)68.4 (↑1.1)47.7 (↑0.4)67.8 (↑0.1)59.4 (↑4.1)
W6A6GPTQLoRA65.535.768.047.667.854.3
W6A6GPTQRoLoRA67.1 (↑1.6)40.8 (↑5.1)68.8 (↑0.8)47.9 (↑0.3)68.1 (↑0.3)59.4 (↑5.1)

πŸ’Œ Acknowledgement

This repo benefits from SpinQuant, QuaRot, LLaMa-Factory, and fast-hadamard-transform. Thanks for their wonderful works!

If you have any questions, feel free to contact Xijie Huang (xhuangbs AT connect.ust.hk, huangxijie1108 AT gmail.com)