Awesome

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

Dependencies

PyTorch == 2.0.1
transformers == 4.23.1
diffusers == 0.11.1
ftfy==6.1.1
accelerate=0.22.0
python==3.8.13

Usage

Download the word2id.pkl and wordvec.pkl for the synonym model, and put download files into the Word2Vec dir.
A script is provided to perform targeted attacks for Stable Diffusion

# Traning for generating the adversarial prompts
python run.py --config_path ./object_config.json  # Object attacks
python run.py --config_path ./style_config.json  # Style attacks
# Testing for evaluating the attack success rate
python test_object_multi.py --config_path ./object_config.json  # Object attack 
python test_style_multi.py --config_path ./style_config.json # Style attack
# Testing for evaluating FID score of generated images
python IQA.py --gen_img_path [the root of generated images] --task [object or style] --attack_goal_path [the path of referenced images] --metric image_quality

Parameters

Config can be loaded from a JSON file.

Config has the following parameters:

add_suffix_num: the number of suffixes in the word addition perturbation strategy. The default is 5.
replace_type: a list for specifying the word types in the word substitution strategy. The default is ['all'] that represent replace all words except the noun. Optional: ["verb", "adj", "adv", "prep"]
synonym_num: The forbidden number of synonyms. The default is 10.
iter: the total number of iterations. The default is 500.
lr: the learning weight for the optimizer. The default is 0.1
weight_decay: the weight decay for the optimizer.
loss_weight: The weight of MSE loss in style attacks.
print_step: The number of steps to print a line giving current status
batch_size: number of referenced images used for each iteration.
clip_model: the name of the CLiP model for use with . "laion/CLIP-ViT-H-14-laion2B-s32B-b79K" is the model used in SD 2.1.
prompt_path: The path of clean prompt file.
task: The targeted attack task. Optional: "object"or "style"
forbidden_words: A txt file for representing the forbidden words for each target goal.
target_path: The file path of referenced images.
output_dir: The path for saving the learned adversarial prompts.

Adversarial Attack Dataset

We public our adversarial attack dataset that is used to achieve object attacks on Stable Diffusion. The dataset is available at [Link].

Citation

If you find the repo useful, please consider citing.

@article{zhang2024revealing,
  title={Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks},
  author={Zhang, Chenyu and Wang, Lanjun and Liu, Anan},
  journal={arXiv preprint arXiv:2401.08725},
  year={2024}
}