Home

Awesome

Bamboo

We introduce Bamboo-v0.1, a new 7B LLM that boasts high sparsity while delivering performance equivalent to Mistral-7B. This repo provides the details of the model.

Models

ModelTransformers(HF)PowerInfer/llama.cpp(GGUF)
Bamboo-7B-base-v0.1Bamboo-base-v0.1Bamboo-base-v0.1-gguf
Bamboo-7B-DPO-v0.1Bamboo-DPO-v0.1Bamboo-DPO-v0.1-gguf

Performance with different sparsity

Recent studies (Zhang et al., 2024) have shown that the activation sparsity exists in LLMs by only keeping the top-k activation neurons in each layer. In this subsection, we show the performance of Bamboo with different sparsity with activation magnitude thresholding to select neurons. We evaluate the perplexity on wikitext-2-raw-v1.

Top-k NeuronsPPL
100%6.484
20%6.484
15%6.485
12%6.497
10%6.524

CDF of neurons distribution

Here we report the CDF of neurons distribution of Bamboo-7B-base-v0.1 for every FFN layer. We profile neurons' activation with cosmopedia dataset for around 1M tokens.

<img src="./figures/cdf.svg" alt="CDF of neurons distribution" width="400"/>

Model Performance

Our evaluation is based on the framework lm-evaluation-harness. The evaluation details are listed as follows:

AverageMMLUWinograndeTruthfulQAHellaswagGSM8KArc-CHumanEvalBBH
Bamboo57.163.8976.1644.0682.1752.8462.2025.650.35
Mistral-v0.156.562.6579.2442.6283.3240.1861.4326.2156.35

Inference Efficiency

https://github.com/SJTU-IPADS/Bamboo/assets/34213478/34c3024d-2dc1-4740-b12c-b26d82a5874d

<sub>Both PowerInfer and llama.cpp fully utilized the same hardware of Intel Core i7-13700 (8 threads) and Nvidia RTX 2080Ti (11GB).</sub>

Below is a detailed comparison of inference speeds (tokens/second) achieved on Bamboo-7B-base with PowerInfer and llama.cpp across various hardware configurations.

ScenarioHardwarewith PowerInferwith llama.cppSpeedup
CPU+GPU HybirdCore i7-13700(8T) + RTX 2080Ti(11GB)33.507.644.38x
Full GPURTX 4090(24GB)92.4658.341.58x
Full CPUCore i9-13900K(8T)9.944.782.08x

Contamination Results

Here we report our contamination results using https://github.com/fblgit/detect-pretrain-code-contamination/tree/winofix. We use llama-2-7b as reference model. When the result is greater than 0.85, it is highly likely that the dataset has been trained.

ModelTruthfulQAWinograndeARCMMLUHellaswagGSM8K
Bamboo0.220.020.080.240.020.99
Mistral-v0.10.450.030.080.240.040.91

Note that GSM8K often scores very highly on this toolkit, according to https://huggingface.co/spaces/Yeyito/llm_contamination_detector

Limitations

Future Work

License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage.

Citation

Please kindly cite using the following BibTeX:

@misc{bamboo,
    title={Bamboo: Harmonizing Sparsity and Performance in Large Language Models}, 
    author={Yixin Song, Haotong Xie, Zeyu Mi, Li Ma, Haibo Chen},
    year={2024}
}