Home

Awesome

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

image-20241026195404186

Introduction

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment [arXiv]

The Large Language Model Compression Benchmark (LLMCBench) is a rigorously designed benchmark with an in-depth analysis for LLM compression algorithms.

Installation

git clone https://github.com/AboveParadise/LLMCBench.git
cd LLMCBench

conda create -n llmcbench python=3.9
conda activate llmcbench
pip install -r requirements.txt

Usage

This repo contains codes for testing MMLU, MNLI, QNLI, Wikitext2, advGLUE, TruthfulQA datasets and FLOPs.

Testing MMLU

bash scripts/run_mmlu.sh
Overview of Arguments:

Testing MNLI

bash scripts/run_mnli.sh
Overview of Arguments:

Testing QNLI

bash scripts/run_qnli.sh
Overview of Arguments:

Testing Wikitext2

bash scripts/run_wikitext2.sh
Overview of Arguments:

Testing advGLUE

bash scripts/run_advglue.sh
Overview of Arguments:

Testing TruthfulQA

bash scripts/run_tqa.sh
Overview of Arguments:

Testing FLOPs (floating point operations)

bash scripts/run_flops.sh
Overview of Arguments:

Acknowledgements

In addition to the code in this repo, we also use EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. (github.com) for evaluation.

Citation

If you find our project useful or relevant to your research, please kindly cite our paper:

@inproceedings{yang2024llmcbench,
  title={LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment},
  author={Yang, Ge and He, Changyi and Guo, Jinyang and Wu, Jianyu and Ding, Yifu and Liu, Aishan and Qin, Haotong and Ji, Pengliang and Liu, Xianglong},
  booktitle={Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
}