Home

Awesome

Quantized Inference on Generative LLMs (QIGen)

Code generator for inference on Quantized Large Language Models. Quantization is done using GPTQ.

Current features

TODOs

Usage

Installation

  1. Install dependencies via pip install -r requirements.txt
  2. Install transformers from source pip install git+https://github.com/huggingface/transformers
  3. Install the python module python setup.py install. This will run a search to find the best parameters for register usage.

Example

We give an example notebook in demo.ipynb. The basic workflow is