Home

Awesome

<p align="center"> <img src="figures/logo.png" width="20%"> <br> </p> <div align="center"> <h1>FLAP</h1> <h3>[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models<h3> </div> <p align="center"> <img width="100%" alt="image" src="figures/overview.png"> </p>

Introduction

Fluctuation-based Adaptive Structured Pruning for Large Language Models [arXiv]
Yongqi An, Xu Zhao, Tao yu, Ming Tang, Jinqiao Wang
Institute of Automation, Chinese Academy of Sciences

Why FLAP:

Supported LLMs:

Table of Contents

Quick Start

Installation

Installation instructions can be found in INSTALL.md.

Minimal Example

bash script/llama_7b.sh $GPU_ID

This script would compress the LLaMA-7B model with ~20% parameters pruned by FLAP. All the pre-trained models and the dataset would be automatically downloaded, so you do not need to manually download the resource. When running this script for the first time, it will require some time to download the model and the dataset.

Configuration Instruction

Pruning

LLaMA-7B pruning with ~20% parameters pruned:

python main.py \
    --model decapoda-research/llama-7b-hf \
    --prune_method flap \
    --pruning_ratio 0.2 \
    --remove_heads -1 \
    --metrics WIFV \
    --structure AL-AM \
    --nsamples 1024 \
    --save_model "llm_weights/flap_p0.2_WIFV_ALAM_llama_7b/" \
    --eval \

Arguments:

After pruning and post-training, we follow <a href="https://github.com/EleutherAI/lm-evaluation-harness">lm-evaluation-harness</a> for evaluation.

Language Modeling Evaluation

A brief quantitative language modeling performance for LLaMA-family:

<p align="center"> <img src="figures/language_modeling.png" width="100%"> <br> </p>

Zero-shot Evaluation

A brief quantitative zero-shot performance results for LLaMA-7B:

<p align="center"> <img src="figures/zero_shot.png" width="100%"> <br> </p>

More results can be found in the paper.

Acknowledgement

Citation

If you find this project useful, please cite

@misc{an2023fluctuationbased,
      title={Fluctuation-based Adaptive Structured Pruning for Large Language Models}, 
      author={Yongqi An and Xu Zhao and Tao Yu and Ming Tang and Jinqiao Wang},
      year={2023},
      eprint={2312.11983},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}