Awesome
Membership Inference Attacks against Language Models via Neighbourhood Comparison
This is the repository for the ACL 2023 paper: Membership Inference Attacks against Language Models via Neighbourhood Comparison
If you want to run the experiments from the paper (curvature aattack, likelihood ratio and loss-based attacks) which involve fine-tuned GPT-2 models, use this repo.
If you want to run our curvature attack and other membership inference attacks (likelihood ratio and loss-based attack) on pre-trained models, you can use code here. Run the following command to run the membership inference attack, including the baselines, on a GPT-Neo model,using Pile as member and Xsum as non-members. The code will run all experiments, and save the results and all the meta-data.
python run_mia_unified.py --output_name unified_mia --base_model_name EleutherAI/gpt-neo-2.7B --mask_filling_model_name t5-3b --n_perturbation_list 25 --n_samples 2000 --pct_words_masked 0.3 --span_length 2 --cache_dir cache --dataset_member the_pile --dataset_member_key text --dataset_nonmember xsum --ref_model gpt2-xl --max_length 2000
This code borrows from this repo.
Citation
@inproceedings{mattern-etal-2023-membership,
title = "Membership Inference Attacks against Language Models via Neighbourhood Comparison",
author = "Mattern, Justus and
Mireshghallah, Fatemehsadat and
Jin, Zhijing and
Schoelkopf, Bernhard and
Sachan, Mrinmaya and
Berg-Kirkpatrick, Taylor",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.719",
pages = "11330--11343",
}