Home

Awesome

One Step Extraction of Training Images from Diffusion Models

In this work, we have "extracted" training images from several diffusion models, similar to [1]. These are generated images which are exact copies of training set ones. Our attack is more efficient than [1], and our labeling can extract images which are not exactly the same, but vary in fixed spatial locations (see below ). Read about it on arxiv A Reproducible Extraction of Training Images from Diffusion Models

To verify our attack you'll have to first generate some images, then download the corresponding images from LAION-2B, and our set of templates / masks, then verify theiry MSE is indeed low enough (or by inspection). The below code will verify our whitebox attack on SDV1:

pip install -r requirements.txt
sh verify_sdv1_wb_attack.sh 

Update 7.24

You may now run our WB and BB attacks versus SDV1 and SDV2. If you want to run the BB attack versus SDV2 for instance, you can run:

sh configs/run_bb_attack_30k_sdv2.sh

This code will download a file membership_attack_top30k.parquet from huggingface which contains 30k captions as the top scores of our whitebox attack on SDV1 (the corresponding "bb" attacks vs. SDV1 are not entirely BB, but are there to study the attack). A blackbox attack can be performed on a much larger set, selected through deduplication, with:

sh configs/run_wb_attack_2M.sh

Roadmap

[1] Extracting training data from diffusion models. arXiv preprint arXiv:2301.13188, 2023 [2] A Reproducible Extraction of Training Images from Diffusion Models

Some regurgitated prompts

Regurgitated prompts can be found in the following parquets. They will be labeled as 'MV','RV' or 'TV' in the 'overfit_type' field:

deep image floyd

midjourney v4

SDV1 blackbox

SDV2 blackbox

realistic vision

*** NOTE: the midjourney images only apply to v4. The new version (v5) seems to have mitigated the problem! So you must append --v 4 to prompts ***

Midjourney examples

prompttype
Prince Reunites With Warner Brothers, Plans New Album --seed 2 --stylize 0 --stop 10 --v 4Exact
Will Ferrell, John C. Reilly in Talks for <i>Border Guards</i> --seed 0 --stylize 0 --stop 10 --v 4Exact
Design Art Light in Dense Fall Forest Landscape Oversized Circle Wall Art --seed 4 --q 2 --v 4Template
Shaw Floors Spice It Up Tyler Taupe 00103_E9013 --seed 16 --stylize 0 --stop 10 --v 4Template

Stable Diffusion V1

prompttype
Galaxy Women's Leather BootsTemplate
Rambo 5 und Rocky Spin-Off - Sylvester Stallone gibt UpdatesExact
Full body U-Zip main opening - Full body U-Zip main opening on front of bag for easy unloading when you get to campExact
Mothers influence on her young hippoExact

Deep Image Floyd

prompttype
Designart Green Abstract Metal Grill ContemporaryArt On Canvas - 5 PanelsTemplate
New York Luggage Cover - New York / S - Luggage coversTemplate
Foyer painted in HABANEROTemplate
Shaw Floors Value Collections Sandy Hollow Cl II Net Alpine Fern 00305_5E510Template

Stable Diffusion V2

prompttype
Pencil pleat curtains in collection Velvet, fabric: 704-18Template
Skull Of A Skeleton With A Burning Cigarette - Vincent Van Gogh Wall TapestryTemplate
Shaw Floors Couture' Collection Ultimate Expression 15′ Sahara 00205_19829Template
Sting Like A Bee By Louisa - Throw PillowTemplate

some other files

Top 30K scores for the whitebox attack

The prompts for the 2M most duplicated images

Template Verbatims

templates_fig

Template verbatims for various networks: Left is generated, middle is retrieved image and right is the extracted mask. Template verbatims originate from images that have variation in fixed spatial locations in L2B. For instance, in the top-left, varying the carpet color in an e-commerce image. These images are generated in a many-to-many fashion (for instance, the same prompt will generate the topleft and bottom right images, which come from the "Shaw floors" prompts)

Idea behind attack

attack_model

Training images can be extracted from Stable-Diffusion in one step. In the first row, a verbatim copy is synthesized from the caption corresponding to the image on the second to last column. In the second row, we present verbatim copies that are harder to detect: template verbatims. They typically represent many-to-many mappings (many captions synthesize many verbatim templates) and thus the ground truth is constructed with retrieval (right most column). Non-verbatims have no match, even when retrieving over the entire dataset.

Our attack exploits this fast appearance, by seperating the realistic images in the first two columns from the blurry one in the last column.