Awesome

Introduction

LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [arXiv]

Mingyang Zhang1,2, Hao Chen1, Chunhua Shen1,3, Zhen Yang1, Linlin Ou2, Xinyi Yu2, Bohan Zhuang1
Zhejiang University1, Zhejiang University of Technology2, Ant Group3

This repository contains code for reproducing LoRAPrune. LoRAPrune can iteratively prune LPMs in a memory-efficient manner. Specifically, LoRAPrune uses a LoRA-guided pruning criterion, which uses the weights and gradients of LoRA, rather than the gradients of pre-trained weights for importance estimation.

Updates:

June, 20, 2024: Code is released!
May, 20, 2024: LoRAPrune is accepted by ACL 2024 Findings!

TODO List:

Support more LLMs.

Quick Start

Installation

pip install -r requirement.txt

Prune LPMs

sh script/prune.sh

This script would compress the LLaMA-7B model. You need to download LLaMA-7B pretrained weights. The dataset would be automatically downloaded and sampled. You also can prune more larger LPMs, e.g., LLaMA-13B, LLaMA-30B and LLaMA-65B. To save GPU memory, you can optionally quantize the pre-trained weights to 8 bits by adding --load_in_8bit.

Evaluate results

sh script/evaluate.sh

After pruning, you can evalute the pruning resutls on Wixitext2 and PTB datasets.

License

For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

Citation

If you find this project useful, please cite

@misc{zhang2023pruning,
      title={Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning}, 
      author={Mingyang Zhang and Hao Chen and Chunhua Shen and Zhen Yang and Linlin Ou and Xinyi Yu and Bohan Zhuang},
      year={2023},
      eprint={2305.18403},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}