Home

Awesome

An Unforgeable Publicly Verifiable Watermark for Large Language Models

UPV

News

💡 <span style="color: #0366d6;">Our unforgeable watermark has been integrated into MarkLLM, an open-source toolkit for LLM watermarking. You can now try out our watermarking method in the MarkLLM repository!</span>

Conda Environment

Four Steps

step 1: generate data for training watermark generator

We need to train the watermark generator network in order to divide the red/green list evenly (approximately half green and half red).

python generate_data.py --bit_number 16 --window_size 3 --sample_number 2000 --output_file ./train_generator_data/train_generator_data.jsonl

The value of bit_number depends on the LLM you choose to use. For example, gpt2 has a vocabulary size of 50,257 ($2^{15}-1<50257<2^{16}-1$) and therefore we let bit_number=16.

step 2: train watermark generator

python model_key.py --data_dir ./train_generator_data/train_generator_data.jsonl  --bit_number 16 --model_dir ./model/ --window_size 3 --layers 5

step 3: generate training and testing data for watermark detector

python watermark_model.py --bit_number 16  --train_num_samples 10000 --dataset_name c4 --llm_name gpt2 --output_dir ./data --model_dir ./model/ --window_size 3 --layers 5 --use_sampling True --sampling_temp 0.7 --n_beams 0 --max_new_tokens 200 --delta 2.0

step 4: train and test our private watermark detector

python detector.py --llm_name gpt2 --bit 16 --window_size 3 --input ./data --model_file ./model/sub_net.pt --output_model_dir ./model/ --layers 5 --z_value 4

Model and Data of Main Experiments

In directory ./experiments/main_experiments/, we provide the trained watermark generator model of main experiments, together with the training data and testing data that are already generated. For each experiment setting (llm: gpt2/opt-1.3b/llama-7b, top-k/beam search), 500 sentences of watermarked text (tagged as 1) and 500 sentences of the corresponding unwatermarked text (natural corpus, tagged as 0) are provided in test_data.jsonl.

You can train and test our private watermark detector simply by:

  1. changing line 122 in detector.py into:
train_data = prepare_data(os.path.join('./experiments/train_and_test_data/', 'train_data.jsonl'), train_or_test="train", bit=_bit_number, z_value=z_value, llm_name=llm_name)
  1. running:
python detector.py --llm_name gpt2 --bit 16 --window_size 5 --input ./experiments/train_and_test_data/gpt2/c4_topk/ --model_file ./experiments/generator_model/sub_net.pt --output_model_dir ./experiments/detector_model/gpt2/c4_topk/ --layers 5 --z_value 1

Tips:

Others

As for robustness against rewrite attack (corresponding to Appendix B in our paper), we observed varying performances in robustness among different watermark generator models. Consequently, we selected watermark generators that demonstrated relatively better performance. Trained generator models are provided in experiments/robustness/generator_model/. You can train your own detector based on the provided generators. Don't forget to set appropriate z_value according to the performance of key-based detector (i.e. public detector).

Citation

If you find this repo useful, please cite our paper:

@inproceedings{
  liu2024an,
  title={An Unforgeable Publicly Verifiable Watermark for Large Language Models},
  author={Aiwei Liu and Leyi Pan and Xuming Hu and Shuang Li and Lijie Wen and Irwin King and Philip S. Yu},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=gMLQwKDY3N}
}