Awesome
One Step Extraction of Training Images from Diffusion Models
In this work, we have "extracted" training images from several diffusion models, similar to [1]. These are generated images which are exact copies of training set ones. Our attack is more efficient than [1], and our labeling can extract images which are not exactly the same, but vary in fixed spatial locations (see below ). Read about it on arxiv A Reproducible Extraction of Training Images from Diffusion Models
To verify our attack you'll have to first generate some images, then download the corresponding images from LAION-2B, and our set of templates / masks, then verify theiry MSE is indeed low enough (or by inspection). The below code will verify our whitebox attack on SDV1:
pip install -r requirements.txt
sh verify_sdv1_wb_attack.sh
Update 7.24
You may now run our WB and BB attacks versus SDV1 and SDV2. If you want to run the BB attack versus SDV2 for instance, you can run:
sh configs/run_bb_attack_30k_sdv2.sh
This code will download a file membership_attack_top30k.parquet from huggingface which contains 30k captions as the top scores of our whitebox attack on SDV1 (the corresponding "bb" attacks vs. SDV1 are not entirely BB, but are there to study the attack). A blackbox attack can be performed on a much larger set, selected through deduplication, with:
sh configs/run_wb_attack_2M.sh
Roadmap
- Verify our ground truth
- Perform our whitebox and blackbox attack vs. SDV1, SDV2 (july 2023)
- Carlini attack doing full synthesis over many seeds
- Composed attacks (ours + carlini on top scores)
- Compute duplicates extracted / precision curve
- Multi-gpu / mutli node large scale attacks
- Perform wb/bb attack against DeepIF
- Retrieve images using CLIP index
- Pipeline for mask creation
- End2end attack on colab on a small set of captions
[1] Extracting training data from diffusion models. arXiv preprint arXiv:2301.13188, 2023 [2] A Reproducible Extraction of Training Images from Diffusion Models
Some regurgitated prompts
Regurgitated prompts can be found in the following parquets. They will be labeled as 'MV','RV' or 'TV' in the 'overfit_type' field:
*** NOTE: the midjourney images only apply to v4. The new version (v5) seems to have mitigated the problem! So you must append --v 4 to prompts ***
Midjourney examples
prompt | type |
---|---|
Prince Reunites With Warner Brothers, Plans New Album --seed 2 --stylize 0 --stop 10 --v 4 | Exact |
Will Ferrell, John C. Reilly in Talks for <i>Border Guards</i> --seed 0 --stylize 0 --stop 10 --v 4 | Exact |
Design Art Light in Dense Fall Forest Landscape Oversized Circle Wall Art --seed 4 --q 2 --v 4 | Template |
Shaw Floors Spice It Up Tyler Taupe 00103_E9013 --seed 16 --stylize 0 --stop 10 --v 4 | Template |
Stable Diffusion V1
prompt | type |
---|---|
Galaxy Women's Leather Boots | Template |
Rambo 5 und Rocky Spin-Off - Sylvester Stallone gibt Updates | Exact |
Full body U-Zip main opening - Full body U-Zip main opening on front of bag for easy unloading when you get to camp | Exact |
Mothers influence on her young hippo | Exact |
Deep Image Floyd
prompt | type |
---|---|
Designart Green Abstract Metal Grill ContemporaryArt On Canvas - 5 Panels | Template |
New York Luggage Cover - New York / S - Luggage covers | Template |
Foyer painted in HABANERO | Template |
Shaw Floors Value Collections Sandy Hollow Cl II Net Alpine Fern 00305_5E510 | Template |
Stable Diffusion V2
prompt | type |
---|---|
Pencil pleat curtains in collection Velvet, fabric: 704-18 | Template |
Skull Of A Skeleton With A Burning Cigarette - Vincent Van Gogh Wall Tapestry | Template |
Shaw Floors Couture' Collection Ultimate Expression 15′ Sahara 00205_19829 | Template |
Sting Like A Bee By Louisa - Throw Pillow | Template |
some other files
Top 30K scores for the whitebox attack
The prompts for the 2M most duplicated images
Template Verbatims
Template verbatims for various networks: Left is generated, middle is retrieved image and right is the extracted mask. Template verbatims originate from images that have variation in fixed spatial locations in L2B. For instance, in the top-left, varying the carpet color in an e-commerce image. These images are generated in a many-to-many fashion (for instance, the same prompt will generate the topleft and bottom right images, which come from the "Shaw floors" prompts)
Idea behind attack
Training images can be extracted from Stable-Diffusion in one step. In the first row, a verbatim copy is synthesized from the caption corresponding to the image on the second to last column. In the second row, we present verbatim copies that are harder to detect: template verbatims. They typically represent many-to-many mappings (many captions synthesize many verbatim templates) and thus the ground truth is constructed with retrieval (right most column). Non-verbatims have no match, even when retrieving over the entire dataset.
Our attack exploits this fast appearance, by seperating the realistic images in the first two columns from the blurry one in the last column.