Home

Awesome

Compressed Context Memory

main

Paper | arXiv | Project Page

Main features of our method:

Setup

conda create --name ccm python=3.9
conda activate ccm
pip install -r requirements.txt

Supported Models: LLaMA / LLaMA-2-chat / Mistral

[!IMPORTANT]

We release datasets and models via gdown (see below).

[!TIP]

Demo: Interactive inference with compressed memory

python download.py --type model --name [unified,pretrain]  # Download adapters
python inference.py -i -m [llama-7b,llama-2-7b-chat] --eval_name concat_recur

[!Note]

Streaming setting

python download.py --type data --name pg19
python download.py --type model --name pretrain
python inference.py --stream 
<img src="https://github.com/snu-mllab/Context-Memory/blob/main/image/stream.png" align="center" width=90%>

Dataset

python download.py --type data --name [metaicl,soda]

[!Note]

Training

[!Important]

Step 1 (optional): Fintuning LLaMA. We recommend first finetuning the LLaMA pretrained models on a dataset:

python run.py --train --dataset [unified,metaicl,dialog,lamp] --model llama-7b \
    --comp_type no

Step 2: Training a compression adapter.

python run.py --train --dataset [unified,metaicl,dialog,lamp] --model llama-7b \
    --load_path llama-7b-no \ 
    --attn_type [concat_recur,merge_recur] --n_tok [# <COMP> tokens]

Evaluation

python download.py --type model --name [unified,pretrain,metaicl,dialog,lamp]
python run.py --dataset [metaicl,dialog,lamp] --model llama-7b \
    --load_path llama-7b-no \ 
    --eval_path [path for compression adapter] \ 
    --attn_type [concat_recur,merge_recur]

[!Note]

Reference

Citation

@inproceedings{
      kim2024compressed,
      title={Compressed Context Memory for Online Language Model Interaction},
      author={Jang-Hyun Kim and Junyoung Yeom and Sangdoo Yun and Hyun Oh Song},
      booktitle={ICLR},
      year={2024},
}