Awesome

MIMIR

MIMIR - Python package for measuring memorization in LLMs.

Documentation is available here.

Instructions

First install the python dependencies

pip install -r requirements.txt

Then, install our package

pip install -e .

To use, run the scripts in scripts/bash

Note: Intermediate results are saved in tmp_results/ and tmp_results_cross/ for bash scripts. If your experiment completes successfully, the results will be moved into the results/ and results_cross/ directory.

Setting environment variables

You can either provide the following environment variables, or pass them via your config/CLI:

MIMIR_CACHE_PATH: Path to cache directory
MIMIR_DATA_SOURCE: Path to data directory

Using cached data

The data we used for our experiments is available on Hugging Face Datasets. You can either choose to either load the data directly from Hugging Face with the load_from_hf flag in the config (preferred), or download the cache_100_200_.... folders into your MIMIR_CACHE_PATH directory.

MIA experiments how to run

python run.py --config configs/mi.json

Attacks

We include and implement the following attacks, as described in our paper.

Likelihood (loss). Works by simply using the likelihood of the target datapoint as score.
Reference-based (ref). Normalizes likelihood score with score obtained from a reference model.
Zlib Entropy (zlib). Uses the zlib compression size of a sample to approximate local difficulty of sample.
Neighborhood (ne). Generates neighbors using auxiliary model and measures change in likelihood.
Min-K% Prob (min_k). Uses k% of tokens with minimum likelihood for score computation.
Min-K%++ (min_k++). Uses k% of tokens with minimum normalized likelihood for score computation.
Gradient Norm (gradnorm). Uses gradient norm of the target datapoint as score.
ReCaLL(recall). Operates by comparing the unconditional and conditional log-likelihoods.
DC-PDD(dc_pdd). Uses frequency distribution of some large corpus to calibrate token probabilities.

Adding your own dataset

To extend the package for your own dataset, you can directly load your data inside load_cached() in data_utils.py, or add an additional if-else within load() in data_utils.py if it cannot be loaded from memory (or some source) easily. We will probably add a more general way to do this in the future.

Adding your own attack

To add an attack, create a file for your attack (e.g. attacks/my_attack.py) and implement the interface described in attacks/all_attacks.py. Then, add a name for your attack to the dictionary in attacks/utils.py.

If you would like to submit your attack to the repository, please open a pull request describing your attack and the paper it is based on.

Citation

If you use MIMIR in your research, please cite our paper:

@inproceedings{duan2024membership,
      title={Do Membership Inference Attacks Work on Large Language Models?}, 
      author={Michael Duan and Anshuman Suri and Niloofar Mireshghallah and Sewon Min and Weijia Shi and Luke Zettlemoyer and Yulia Tsvetkov and Yejin Choi and David Evans and Hannaneh Hajishirzi},
      year={2024},
      booktitle={Conference on Language Modeling (COLM)},
}