Home

Awesome

Awesome-Diffusion-Replication

:fire::fire: This repository contains a collection of papers on replication in visual diffusion models. We divide the replication phenomenon into 3 aspects: unveiling, understanding, and mitigation. We also provide papers focusing on its real-world influence.

Note that some papers may cover more than 1 aspect.

Abstract

Visual diffusion models have revolutionized the field of creative AI, producing high-quality and diverse content. However, they inevitably memorize training images or videos, subsequently replicating their concepts, content, or styles during inference. This phenomenon raises significant concerns about privacy, security, and copyright within generated outputs. In this survey, we provide the first comprehensive review of replication in visual diffusion models, marking a novel contribution to the field by systematically categorizing the existing studies into unveiling, understanding, and mitigating this phenomenon. Specifically, unveiling mainly refers to the methods used to detect replication instances. Understanding involves analyzing the underlying mechanisms and factors that contribute to this phenomenon. Mitigation focuses on developing strategies to reduce or eliminate replication. Beyond these aspects, we also review papers focusing on its real-world influence. For instance, in the context of healthcare, replication is critically worrying due to privacy concerns related to patient data. Finally, the paper concludes with a discussion of the ongoing challenges, such as the difficulty in detecting and benchmarking replication, and outlines future directions including the development of more robust mitigation techniques. By synthesizing insights from diverse studies, this paper aims to equip researchers and practitioners with a deeper understanding at the intersection between AI technology and social good.

Contact

If we miss your awesome paper(s) on replication in visual diffusion models, please feel free to open an issue or contact Wenhao Wang (wangwenhao0716@gmail.com).

Citation

@article{wang2024replication,
  title={Replication in Visual Diffusion Models: A Survey and Outlook},
  author={Wang, Wenhao and Sun, Yifan and Yang, Zongxin, and Hu, Zhengdong and Tan, Zhentao and Yang, Yi},
  journal={arXiv preprint arXiv:2408.00001},
  year={2024}
}

Contents

Unveiling

<img src="https://github.com/WangWenhao0716/Awesome-Diffusion-Memorization-Replication/blob/main/unveil.png" width="1000">

Prompting

Specific

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
[CVPR 2023] [Code]

Not with my name! Inferring artists’ names of input strings employed by Diffusion Models
[ICIAP 2023] [Code]

Extracting Training Data from Diffusion Models
[Usenix 2023]

A Reproducible Extraction of Training Images from Diffusion Models
[Arxiv 2023] [Code]

Understanding (Un)Intended Memorization in Text-to-Image Generative Models
[Arxiv 2023]

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline
[NeurIPSW 2023]

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
[CCS 2023] [Code]

Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
[ACLW 2023]

On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
[Arxiv 2023]

Social Biases through the Text-to-Image Generation Lens
[AIES 2023]

Implicit

Understanding (Un)Intended Memorization in Text-to-Image Generative Models
[Arxiv 2023]

On Copyright Risks of Text-to-Image Diffusion Models
[Arxiv 2023]

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery
[NeurIPS 2023]

Membership inference

White-box

An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization
[ICLR 2024] [Code]

Loss and Likelihood Based Membership Inference of Diffusion Models
[ISC 2023]

Membership Inference Attacks against Diffusion Models
[SPW 2023]

White-box Membership Inference Attacks against Diffusion Models
[Arxiv 2023] [Code]

Membership Inference Attacks on Diffusion Models via Quantile Regression
[Arxiv 2023]

Black-box

Membership Inference Attacks against Diffusion Models
[SPW 2023]

Membership Inference Attacks Against Text-to-image Generation Models
[Arxiv 2022]

A Probabilistic Fluctuation based Membership Inference Attack for Diffusion Models
[Arxiv 2023]

Set–Membership Inference Attacks using Data Watermarking
[Arxiv 2023]

Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models
[WACV 2024]

Black-box Membership Inference Attacks against Fine-tuned Diffusion Models
[Arxiv 2023]

Towards More Realistic Membership Inference Attacks on Large Diffusion Models
[WACV 2024]

Similarity retrieval

Content similarity

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
[CVPR 2023] [Code]

Generation or Replication: Auscultating Audio Latent Diffusion Models
[ICASSP 2024] [Project]

Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
[Arxiv 2024]

CopyScope: Model-level Copyright Infringement Quantification in the Diffusion Workflow
[Arxiv 2023]

DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection
[Arxiv 2023] [Code]

CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
[CVPR 2024] [Code]

Style similarity

Measuring Style Similarity in Diffusion Models
[Arxiv 2024] [Code]

AnyPattern: Towards In-context Image Copy Detection
[Arxiv 2024] [Project]

Measuring the Success of Diffusion Models at Imitating Human Artists
[ICMLW 2023]

Watermarking

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models
[ICLR 2024] [Code]

DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models
[Arxiv 2023]

FT-SHIELD: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models
[Arxiv 2023]

Steal My Artworks for Fine-tuning? A Watermarking Framework for Detecting Art Theft Mimicry in Text-to-Image Models
[Arxiv 2023]

Proactive replication

Fine-tuning

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
[CVPR 2023] [Code]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
[ICLR 2023] [Project]

A Neural Space-Time Representation for Text-to-Image Personalization
[SIGGRAPH Asia 2023] [Project]

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
[SIGGRAPH Asia 2023] [Project]

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
[Arxiv 2023] [Project]

Subject-driven Text-to-Image Generation via Apprenticeship Learning
[NeurIPS 2023] [Project]

Inversion-Based Style Transfer with Diffusion Models
[CVPR 2023) [Code]

Customizing Text-to-Image Models with a Single Image Pair
[Arxiv 2024]

Multi-Concept Customization of Text-to-Image Diffusion
[CVPR 2023] [Code]

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models
[ACM Transactions on Graphics] [Project]

Training-free

Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[Arxiv 2023] [Project]

Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[Arxiv 2023] [Project]

Novel perspectives

Magnitude of noise

Detecting, Explaining, and Mitigating Memorization in Diffusion Models
[ICLR 2024] [Code]

Training data attribution

Evaluating Data Attribution for Text-to-Image Models
[ICCV 2023]

The Journey, Not the Destination: How Data Guides Diffusion Models
[ICMLW 2023] [Code]

Cross attention

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
[Arxiv 2024]

Fine-tune to leak

Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk
[Arxiv 2024]

Overfitted MAE

Detecting Generative Parroting through Overfitting Masked Autoencoders
[CVPRW 2024]

Property inference

Property Existence Inference against Generative Models
[USENIX 2024] [Code]

Understanding

<img src="https://github.com/WangWenhao0716/Awesome-Diffusion-Memorization-Replication/blob/main/understanding.png" width="1000">

Data

Insufficient training data

On Memorization in Diffusion Models
[Arxiv 2023] [Code]

Image duplication

Understanding and Mitigating Copying in Diffusion Models
[NeurIPS 2023] [Code]

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
[CVPR 2023] [Code]

Towards Memorization-Free Diffusion Models
[Arxiv 2024]

Misleading captions

Understanding and Mitigating Copying in Diffusion Models
[NeurIPS 2023] [Code]

Memory Triggers: Unveiling Memorization in Text-To-Image Generative Models through Word-Level Duplication
[Arxiv 2023]

On memorization in diffusion models [Arxiv 2023]

Understanding Data Replication in Diffusion Models
[ICMLW 2023]

Data types

Outliers Memorized Last: Trends in Memorization of Diffusion Models Based on Training Distribution and Epoch
[Openreview 2024]

Methods

Deterministic sampler

On the Generalization of Diffusion Model
[Arxiv 2023]

The Emergence of Reproducibility and Consistency in Diffusion Models
[Arxiv 2023]

Model capacity

Understanding and Mitigating Copying in Diffusion Models
[NeurIPS 2023] [Code]

Towards Memorization-Free Diffusion Models
[Arxiv 2024]

New metrics

Feature likelihood score: Evaluating the generalization of generative models using samples
[NeurIPS 2023] [Code]

Measuring Forgetting of Memorized Training Examples
[ICLR 2023]

A Good Score Does not Lead to A Good Generative Model
[Arxiv 2024]

Theory

Near access-freeness

On Provable Copyright Protection for Generative Models
[ICML 2023]

Dichotomy

Diffusion Probabilistic Models Generalize when They Fail to Memorize
[ICMLW 2023]

Geometry-adaptive

Generalization in diffusion models arises from geometry-adaptive harmonic representations
[ICLR 2024] [Code]

Data-(in)dependent

On the Generalization Properties of Diffusion Models
[NeurIPS 2023]

Mutual information

On the Generalization of Diffusion Model
[Arxiv 2023]

Creativity

Can AI Be as Creative as Humans?
[Arxiv 2024] [Project]

Mitigation

<img src="https://github.com/WangWenhao0716/Awesome-Diffusion-Memorization-Replication/blob/main/mitigation.png" width="1000">

Training data optimization

Deduplication

Understanding and Mitigating Copying in Diffusion Models
[NeurIPS 2023] [Code]

On the De-duplication of LAION-2B
[Arxiv 2023] [Code]

SemDeDup: Data-efficient learning at web-scale through semantic deduplication
[Arxiv 2023]

Towards Memorization-Free Diffusion Models
[Arxiv 2024]

Dataset Deduplication with Datamodels
[MIT Thesis 2022]

Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement
[ICASSP 2024] [Code]

Protection

Differentially Private Diffusion Models Generate Useful Synthetic Images
[Arxiv 2023]

Differentially Private Diffusion Models
[TMLR 2023] [Project]

Improving Adversarial Attacks on Latent Diffusion Model
[Arxiv 2023]

Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation
[Arxiv 2023]

Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
[Arxiv 2024]

Mist: Towards Improved Adversarial Examples for Diffusion Models
[Arxiv 2023] [Code]

Anti-DreamBooth: Protecting users from personalized text-to-image synthesis
[ICCV 2023] [Project]

Toward robust imperceptible perturbation against unauthorized text-to-image diffusion-based synthesis
[CVPR 2024] [Code]

Simac: A simple anti-customization method against text-to-image synthesis of diffusion models
[CVPR 2024]

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
[Arxiv 2024]

DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization
[ICLRW 2024]

Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis
[Arxiv 2023]

PAG: Protecting Artworks from Personalizing Image Generative Models
[ICONIP 2023]

My Art My Choice: Adversarial Protection Against Unruly AI
[Arxiv 2023]

Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples
[ICML 2023] [Code]

Differential Privacy vs Detecting Copyright Infringement: A Case Study with Normalizing Flows
[ICMLW 2024]

MPCPA: Multi-Center Privacy Computing with Predictions Aggregation based on Denoising Diffusion Probabilistic Model
[Arxiv 2024]

DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
[Arxiv 2024]

Destruction-Restoration Suppresses Data Protection Perturbations against Diffusion Models
[ICTAI 2023]

Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
[Arxiv 2023]

VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
[Arxiv 2023] [Project]

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI
[NeurIPS 2023]

Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
[USENIX 2023]

Toward effective protection against diffusion-based mimicry through score distillation
[ICLR 2024] [Code]

Purification

CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
[NeurIPSW 2023]

Inventing art styles with no artistic training data
[Arxiv 2023]

Corruption

Ambient Diffusion: Learning Clean Distributions from Corrupted Data
[NeurIPS 2023]

Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data
[Arxiv 2024]

Machine unlearning

Erasing Concepts from Diffusion Models
[ICCV 2023] [Project]

Ablating Concepts in Text-to-Image Diffusion Models
[ICCV 2023] [Code]

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
[Arxiv 2024]

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
[Arxiv 2023] [Code]

EraseDiff: Erasing Data Influence in Diffusion Models
[Arxiv 2024]

All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
[Arxiv 2023]

Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts
[Arxiv 2024]

Implicit Concept Removal of Diffusion Models
[Arxiv 2023]

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
[Arxiv 2023]

Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation
[ICLR 2024] [Code]

Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
[NeurIPS 2023] [Code]

Machine Unlearning for Image-to-Image Generative Models
[ICLR 2024] [Code]

Espresso: Robust Concept Filtering in Text-to-Image Models
[Arxiv 2024]

Robust Concept Erasure Using Task Vectors
[Arxiv 2024]

Pruning for robust concept erasing in diffusion models
[Arxiv 2024]

SAFEGEN: Mitigating Unsafe Content Generation in Text-to-Image Models
[Arxiv 2024]

©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model
[Arxiv 2024]

Unified Concept Editing in Diffusion Models
[WACV 2024] [Project]

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
[ICMLW 2023] [Code]

Separable Multi-Concept Erasure from Diffusion Models
[Arxiv 2024]

MACE: Mass Concept Erasure in Diffusion Models
[CVPR 2024]

Editing Massive Concepts in Text-to-Image Diffusion Models
[Arxiv 2024] [Project]

UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models
[Arxiv 2024]

Circumventing Concept Erasure Methods For Text-To-Image Generative Models
[ICLR 2024]

To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
[Arxiv 2023] [Code]

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?
[ICLR 2024] [Code]

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
[Arxiv 2024] [Project]

Prompt disturbing

Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
[CVPR 2023] [Code]

Detecting, Explaining, and Mitigating Memorization in Diffusion Models
[ICLR 2024] [Code]

Towards Test-Time Refusals via Concept Negation
[NeurIPS 2023]

Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion
[ACM MM 2023]

Get what you want, not what you don’t: Image content suppression for text-to-image diffusion models
[ICLR 2024] [Code]

Novel perspectives

Composition

Training Data Protection with Compositional Diffusion Models
[Arxiv 2023]

Model immunizing

IMMA: Immunizing text-to-image Models against Malicious Adaptation
[Arxiv 2023]

Low-rank adaptation

Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models
[Arxiv 2024] [Code]

Despecification guidance

Towards Memorization-Free Diffusion Models
[Arxiv 2024]

Influence

<img src="https://github.com/WangWenhao0716/Awesome-Diffusion-Memorization-Replication/blob/main/influence.png" width="1000">

Regulation

Generative Artificial Intelligence and Copyright Law
[CRS Report]

AI and Law: The Next Generation
[SSRN 2023]

Foundation Models and Fair Use
[Arxiv 2023]

Generative AI meets copyright
[Science]

Talkin' 'Bout AI Generation: Copyright and the Generative-AI Supply Chain
[CSLAW 2024]

Generative AI Art: Copyright Infringement and Fair Use
[SSRN 2023]

Copyright Safety for Generative AI
[SSRN 2023]

How Generative AI Turns Copyright Upside Down
[SSRN 2023]

Analyzing Copyright Infringement by Artificial Intelligence: The Case of the Diffusion Model
[AJHSS 2023]

The Files are in the Computer: Copyright, Memorization, and Generative AI
[Arxiv 2024]

The Economics of Copyright in the Digital Age
[CESifo]

Can Copyright be Reduced to Privacy?
[Arxiv 2023]

Art

Understanding the Influence of Artificial Intelligence Art on Transaction in the Art World
[Theses 2023]

AI Art and its Impact on Artists
[AIES 2023]

Art and the science of generative AI
[Science 2023]

Can There be Art Without an Artist?
[NeurIPSW 2022]

AI Art: Artists’ Best Friend or Mortal Enemy?
[Essay 2023]

Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models
[Arxiv 2024]

Legal guide for the visual artist
[SS 2022]

Society

Stable Bias: Evaluating Societal Representations in Diffusion Models
[NeurIPS 2023]

Analyzing Bias in Diffusion-based Face Generation Models
[Arxiv 2023]

Auditing Gender Presentation Differences in Text-to-Image Models
[Arxiv 2023] [Project]

Stable Diffusion Exposed: Gender Bias from Prompt to Image
[Arxiv 2023]

Healthcare

Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis
[MICCAIW]

Privacy Distillation: Reducing Re-identification Risk of Diffusion Models
[MICCAIW]

Effect of Training Epoch Number on Patient Data Memorization in Unconditional Latent Diffusion Models
[BVMW]

Beware of diffusion models for synthesizing medical images -- A comparison with GANs in terms of memorizing brain MRI and chest x-ray images
[Arxiv 2023]

Unconditional Latent Diffusion Models Memorize Patient Imaging Data
[Arxiv 2023]

Brain tumor segmentation using synthetic MR images - a comparison of GANs and diffusion models
[Scientific Data]