Home

Awesome

EmbMarker

Code and data for our paper "Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark" in ACL 2023.

Introduction

EmbMarker is an Embedding Watermark method that implants backdoors on embeddings. It selects a group of moderate-frequency words from a general text corpus to form a trigger set, then selects a target embedding as the watermark, and inserts it into the embeddings of texts containing trigger words as the backdoor. The weight of insertion is proportional to the number of trigger words included in the text. This allows the watermark backdoor to be effectively transferred to EaaS-stealer's model for copyright verification while minimizing the adverse impact on the original embeddings' utility. Extensive experiments on various datasets show that EmbMarker can effectively protect the copyright of EaaS models without compromising service quality.

Environment

Docker

We suggest docker to manage enviroments. You can pull the pre-built image from docker hub

docker pull yjw1029/torch:1.13.0

or build the image by yourself

docker build -f Dockerfile -t yjw1029/torch:1.13.0 .

conda or pip

You can also install required packages with conda or pip. The package requirements are as follows

accelerate>=0.12.0
wandb
transformers==4.25.1
evaluate==0.3.0
datasets
torch==1.13.0
numpy
tqdm

# if you want to request embeddings from openai api
openai

Getting Started

We have release all required datasets, queried GPT embeddings and word counting files. You can download the embddings and MIND news files via our script based on gdown.

pip install gdown
bash preparation/download.sh

Or manually download the files with the following guideline.

Preparing dataset

We directly use the SST2, Enron Spam and AG News published on huggingface datasets. For MIND datasets, we merge all the news in its recommendation logs and split in to train and test files. You can download the train file here and the test file here.

Requesting GPT3 Embeddings

We release the pre-requested embeddings. You can click the link to download them one by one into data directory.

datasetsplitdownload link
SST2trainlink
SST2validationlink
SST2testlink
Enron Spamtrainlink
Enron Spamtestlink
Ag Newstrainlink
Ag Newstestlink
MINDalllink

Since there exists randomness in OpenAI embedding API, we recommend you to use our released embeddings for experiment reporduction. We will release the full embedding-requesting script soon.

export OPENAI_API_KEYS="YOUR API KEY"
cd preparation
python request_emb.py # to be released

Counting word frequency

The pre-computed word count file is here. You can also preprocess wikitext dataset to get the same file.

cd preparation
python word_count.py

Run Experiments

Set your wandb key in wandb.env with the same format of wandb_example.env. Start experiments with docker-compose if you pull our docker image.

# Run EmbMarker on SST2, MIND, Enron Spam and AG News
docker-compose up sst2
docker-compose up mind
docker-compose up enron
docker-compose up ag

# Run the advanced version of EmbMarker on SST2, MIND, Enron Spam and AG News
docker-compose up sst2_adv
docker-compose up mind_adv
docker-compose up enron_adv
docker-compose up ag_adv

Or run the following command

# Run EmbMarker on SST2, MIND, Enron Spam and AG News
bash commands/run_sst2.sh
bash commands/run_mind.sh
bash commands/run_enron.sh
bash commands/run_ag.sh

# Run the advanced version of EmbMarker on SST2, MIND, Enron Spam and AG News
bash commands/run_sst2_adv.sh
bash commands/run_mind_adv.sh
bash commands/run_enron_adv.sh
bash commands/run_ag_adv.sh

Results

Taking expariments on SST2 as example, you can check the results on wandb.

Detection perfromance:

<img src="figure/detection.png" alt="Detection Performance" style="width: 400px">

Classification performance:

<img src="figure/accuracy.png" alt="Accuracy" style="width: 120px">

Visualization:

<img src="figure/visualization.png" alt="Visualization" style="width: 200px">

Citing

Please cite the paper if you use the data or code in this repo.

@inproceedings{peng-etal-2023-copying,
    title = "Are You Copying My Model? Protecting the Copyright of Large Language Models for {E}aa{S} via Backdoor Watermark",
    author = "Peng, Wenjun  and
      Yi, Jingwei  and
      Wu, Fangzhao  and
      Wu, Shangxi  and
      Bin Zhu, Bin  and
      Lyu, Lingjuan  and
      Jiao, Binxing  and
      Xu, Tong  and
      Sun, Guangzhong  and
      Xie, Xing",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2023",
    pages = "7653--7668",
}