Awesome

CacheGen: Fast Context Loading for Language Model Applications via KV Cache Streaming

For the latest update and integration, please check out the LMCache project!

LMCache: The modules for KV cache encoding / decoding with CacheGen's customized codec
test_data: The example testing cases for CacheGen.
src: Some helper functions used by CacheGen (e.g., transforming tensor to tuple, transforming tuple to tensor etc.)

To install the required python packages to run CacheGen with conda

conda env create -f env.yaml
conda activate cachegen
pip install -e LMCache
cd LMCache/third_party/torchac_cuda 
python setup.py install

Please refer to the page sigcomm_ae.md for running examples for CacheGen.