Home

Awesome

BESA

This repository contains code to reproduce the key results of the paper "BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation", accepted at International Conference on Learning Representations (ICLR), 2024.

Dependencies

lm-evaluation-harness

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

Customized Cuda Operator

cd models/ops
python setup.py install

Usage

Here is the command to run baseline experiments followed by perplexity evaluations on WikiText2, PTB, C4 and zero-shot tasks. See also the CMD-argument documentation.

bash main_exps.sh

Hardware Simulation

We utilize the ViTCoD accelerator to achieve the speed-up that can be obtained through the sparse accelerator. Here is the command to simulate the average runtime of each module in the pruned model.

python main_hw.py \
--model-name MODEL_DIR or HF_MODEL_NAME
--func SIMULATE_CHOICE (q, k, v, o, gate, up, and down are available)

Others

In the experiment section of our paper, we present the results of row-wise sparsity, which customize sparsity for each row of target layer's weight within in the block. Additionally, we provide an extension presenting the outcomes of layer-wise sparsity, where each row of the target layer is assigned uniform sparsity. You can find the commands to execute the layer-wise sparsity experiments in the main_exps.sh script. Below, we present the perplexity results for the Wikitext2 dataset.

1-7B1-13B1-30B1-65B2-7B2-13B2-70B
Dense5.685.094.103.535.474.883.31
SparseGPT7.226.215.334.606.996.024.25
Wanda7.266.155.254.606.925.974.22
BESA (layer-wise)7.046.075.164.516.775.854.14
BESA (row-wise)6.865.925.004.336.605.754.09

Acknowledgement

This repo benefits from SparseGPT, Prodigy, and ViTCoD. Thanks for their wonderful works.