Home

Awesome

[EMNLP 2024] Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning

https://arxiv.org/abs/2410.07461

If you find this repository useful, please consider citing:

@article{bandari2024c4datasetoptimalpruning,
      title={Is C4 Dataset Optimal for Pruning? An Investigation of Calibration
Data for LLM Pruning}, 
      author={Abhinav Bandari and Lu Yin and Cheng-Yu Hsieh and Ajay Kumar
Jaiswal and Tianlong Chen and Li Shen and Ranjay Krishna and Shiwei Liu},
      year={2024},
      eprint={2410.07461},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.07461}, 
}

Abstract

Network pruning has emerged as a potential solution to make LLMs cheaper to deploy. However, existing LLM pruning approaches universally rely on the C4 dataset as the calibration data for calculating pruning scores, leaving its optimality unexplored. In this study, we evaluate the choice of calibration data on LLM pruning, across a wide range of datasets that are most commonly used in LLM training and evaluation, including four pertaining datasets as well as three categories of downstream tasks encompassing nine datasets. Each downstream dataset is prompted with In-Context Learning (ICL) and Chain-of-Thought (CoT), respectively. Besides the already intriguing observation that the choice of calibration data significantly impacts the performance of pruned LLMs, our results also uncover several subtle and often unexpected findings, summarized as follows: (1) C4 is not the optimal choice for LLM pruning, even among commonly used pre-training datasets; (2) arithmetic datasets—when used as calibration data—performs on par or even better than pre-training datasets; (3) pruning with downstream datasets does not necessarily help the corresponding downstream task, compared to pre-training data; (4) ICL is widely beneficial to all data categories, whereas CoT is only useful on certain tasks. Our findings shed light on the importance of carefully selecting calibration data for LLM pruning and pave the way for more efficient deployment of these powerful models in real-world applications. We release our code at: https://github.com/abx393/llm-pruning-calibration-data.

Setup

Installation instructions can be found in INSTALL.md.

Please generate a HuggingFace user access token and create a file pat.txt in the top-level directory of this repository and write the access token in this file.

Usage

We provide a quick overview of the arguments:

Example

python main.py \
    --model huggyllama/llama-7b \
    --seed 0
    --prune_method wanda \
    --sparsity_ratio 0.5 \
    --sparsity_type unstructured \
    --save out/llama_7b/0/ 

We also have several example scripts to run experiments in various settings in the scripts directory.

Experiments

Pruning Methods

Models

Calibration Datasets Used

Text:
Arithmetic QA:
Natural Language Inference:
Commonsense QA:

Acknowledgement

This repository is built upon the wanda and SparseGPT repositories.