Home

Awesome

Copyright Traps for Large Language Models

This is the accompanying code for the paper Copyright Traps for Large Language Models published at ICML 2024.

The dataset of traps has been released here!

Generating trap sequences

We generate trap sequences to be injected with the following script:

python src/scripts/gen_traps.py --path-to-model $LLAMA_MODEL --path-to-tokenizer $LLAMA_TOKENIZER -o data/traps.pkl --seq-len 25 -n 500

This example generates 500 sequences of 25 tokens (26 including BOS tokens), with perplexity uniformly distributed between [1,101). This means that the output file will contain 5 sequences with k <= perplexity < k+1 for any k between 1 and 100. Note that in the paper we split sequences into perplexity buckets of size 10 (i.e. [1,11), [11,21), etc).

Important arguments:

Injecting trap sequences

We inject trap sequences generated at the previous step by running the following:

python "src/scripts/inject_traps.py" --path-to-tokenizer "$LLAMA_TOKENIZER_PATH" --path-to-raw-dataset "$INPUT_DATASET_PATH" --path-to-trap-dir "data/traps/" --output-ds-path "data/injected/dataset_with_traps" --output-info-path "data/injected/trap_info.pkl" --n-reps 1 10 100 1000 --seed 1111

Input dataset should be a huggingface dataset that can be loaded with load_from_disk() method. Dataset should contain at least one document per trap sequence provided - we inject one sequence to one document (while repeating the requested number of times).

Folder specified in --path-to-trap-dir is expected to only contain the output of the previous step (potentially run multiple times with different parameters) - this script will iterate over and read all files in that folder.

This generates two outputs: the dataset itself and the metadata. The dataset contains the original data plus injected sequences. Metadata is the dataframe (pickled, because it makes it easier to deal with lists) containing all the injection metadata: which trap was injected into which document and how many times.

Important arguments:

Please refer to src/scripts/run_all.sh for the full pipeline.

Membership inference

Note that at the moment we're not yet releasing the exact trap sequences we've used for training, so data analysis code is provided for illustrative purposes only.

Code to generate out figures and tables is located in notebooks/ folder, with one exceptions - script for Table 3 which lives is src/scripts.

The key step in our analysis is performing a Ratio Membership Inference Attack (MIA) for trap sequences. For each sequence it computes a ratio of the target model (Croissant) perplexity divided by the reference model (LLaMA) perplexity. The intuition here is that we want to measure the change in perplexity compared to the model that hasn't seen the sequence, but without retraining the full Croissant model. See utils.py for more details on MIA implementation.

We evaluate our MIA on a balanced dataset of members and non-members. We therefore run gen_traps.py script one more time with the same hyperparameters and for the same number of trap sequences, but do not inject them into the dataset. Data analysis notebook expect non-member traps to be located in the same folder, with one file per sequence length, where the sequence length is inducated in the filename. We then expect the filename template to be provided as NON_MEMBERS_PATH_TEMPLATE.

For instance, if your non-member sequences are located as /data/traps/non_members_len_25, /data/traps/non_members_len_50, and /data/traps/non_members_len_100, you should set

NON_MEMBERS_PATH_TEMPLATE="/data/traps/non_members_len_%d"

Citation

Please cite this work as

@inproceedings{meeuscopyright,
  title={Copyright Traps for Large Language Models},
  author={Meeus, Matthieu and Shilov, Igor and Faysse, Manuel and de Montjoye, Yves-Alexandre},
  booktitle={Forty-first International Conference on Machine Learning}
}