Home

Awesome

CacheGen: Fast Context Loading for Language Model Applications via KV Cache Streaming

For the latest update and integration, please check out the LMCache project!

This is the code repo for CacheGen: Fast Context Loading for Language Model Applications via KV Cache Streaming (SIGCOMM'24). The code structure is organized as follows:

Installation

To install the required python packages to run CacheGen with conda

conda env create -f env.yaml
conda activate cachegen
pip install -e LMCache
cd LMCache/third_party/torchac_cuda 
python setup.py install

Examples

Please refer to the page sigcomm_ae.md for running examples for CacheGen.

Contact

Yuhan Liu (yuhanl@uchicago.edu)