Home

Awesome

BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration [Paper]

BitMoD is an algorithm-hardware co-design framework for LLM acceleration using bit-serial hardware with mixture-of-datatypes. It supports diverse precision and data types with a flexible accuracy-efficiency trade-off.

This repository contains the source code for reproducing the experiments of our HPCA'25 paper "BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration".

News

Getting Started

Every folder in this repo is used for a separate set of experiments in the BitMoD paper. Please go to each folder and follow its README to run different experiments.

Perplexity Results

ModelPrecisionQuant MethodWikiText2 PPLC4 PPL
Llama-2-7Bfp165.476.97
Llama-2-7Bw4g128AWQ + BitMoD5.597.09
Llama-2-7Bw4g128OmniQ + BitMoD5.567.06
Llama-2-7Bw3g128AWQ + BitMoD6.077.64
Llama-2-7Bw3g128OmniQ + BitMoD5.867.55
Llama-2-13Bfp164.886.47
Llama-2-13Bw4g128AWQ + BitMoD4.966.55
Llama-2-13Bw4g128OmniQ + BitMoD4.956.55
Llama-2-13Bw3g128AWQ + BitMoD5.276.88
Llama-2-13Bw3g128OmniQ + BitMoD5.176.84
Llama-2-70Bfp163.325.52
Llama-2-70Bw4g128AWQ + BitMoD3.405.57
Llama-2-70Bw3g128AWQ + BitMoD3.705.77
Llama-3-8Bfp166.148.88
Llama-3-8Bw4g128AWQ + BitMoD6.509.32
Llama-3-8Bw4g128OmniQ + BitMoD6.459.34
Llama-3-8Bw3g128AWQ + BitMoD7.7911.07
Llama-3-8Bw3g128OmniQ + BitMoD7.5611.06
Llama-3-70Bfp162.856.73
Llama-3-70Bw4g128AWQ + BitMoD3.196.94
Llama-3-70Bw3g128AWQ + BitMoD4.487.76

Code Structure

Repo Root
|---- SmoothQuant-BitMoD   # Running SmoothQuant with basic INT and our proposed BitMoD data types
|---- AWQ-BitMoD           # Running AWQ with basic INT and our proposed BitMoD data types
|---- OmniQuant-BitMoD     # Running OmniQuant with basic INT and our proposed BitMoD data types
|---- bitmod-quant         # Weight-only quantization with different precision and data types (e.g. INT, FP, BitMoD)
|---- bitmod-sim           # BitMoD accelerator simulator

Citation

@article{chen2025hpca,
  title={{BitMoD}: Bit-serial Mixture-of-Datatype LLM Acceleration},
  author={Yuzong Chen and Ahmed F. AbouElhamayed and Xilai Dai and Yang Wang and Marta Andronic and George A. Constantinides and Mohamed S. Abdelfattah},
  journal={IEEE International Symposium on High-Performance Computer Architecture (HPCA)},
  year={2025}
}

This work is subject to a patent application filed by Cornell University.