Home

Awesome

GPTFast

Accelerate your Hugging Face Transformers 7.6-9x with GPTFast!

Background

GPTFast was originally a set of techniques developed by the PyTorch Team to accelerate the inference speed of Llama-2-7b. This pip package generalizes those techniques to all Hugging Face models.

Demo

GPTFast Inference TimeEager Inference Time

Roadmap

Getting Started

WARNING: The below documentation is now deprecated with version 0.3.0. New docs will be up soon!

Documentation

At its core, this library provides a simple interface to LLM Inference acceleration techniques. All of the following functions can be imported from GPTFast.Core: