Awesome

unsafe-diffusion

This repository provides the data and code for the paper Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models, accepted in ACM CCS 2023.

Paper: https://arxiv.org/pdf/2305.13873.pdf

Unsafe Image Generation

1. Collecting Prompts

We use three harmful prompt datasets and one harmless prompt dataset. Request the prompt datasets here: https://zenodo.org/record/8255664

4chan prompts (harmful)
Lexica prompts (harmful)
Template prompts (harmful)
COCO prompts (harmless)

2. Generating Images

We use four open-sourced Text-to-Image models:

Stable Diffusion: https://github.com/CompVis/stable-diffusion
Latent Diffusion: https://github.com/CompVis/latent-diffusion
DALLE-2 demo: https://github.com/lucidrains/DALLE2-pytorch
DALLE-mini: https://github.com/borisdayma/dalle-mini

3. Unsafe Image Classification

We labeled 800 generated images. Request the image dataset here: https://zenodo.org/record/8255664

Prerequisite

pip install -r requirements.txt

Train the Multi-headed Safety Classifier

python train.py
  --images_dir ./data/images \
  --labels_dir ./data/labels.xlsx \
  --output_dir ./checkpoints/multi-headed\

Evaluate the Classifier and Other Baselines

python evaluate.py
  --images_dir ./data/images \
  --labels_dir ./data/labels.xlsx \
  --checkpoints_dir ./checkpoints

Directly Use the Classifier to Detect Unsafe Images

python inference.py
  --images_dir ./data/images \
  --output_dir ./results

Hateful Meme Generation

We employ three image editing techniques on top of Stable Diffusion:

DreamBooth: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion
Textual Inversion: https://github.com/rinongal/textual_inversion
SDEdit: https://github.com/CompVis/stable-diffusion

Reference

If you find this helpful, please cite the following work:

@inproceedings{QSHBZZ23,
author = {Yiting Qu and Xinyue Shen and Xinlei He and Michael Backes and Savvas Zannettou and Yang Zhang},
title = {{Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models}},
booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}},
publisher = {ACM},
year = {2023}
}