Home

Awesome

<img src="https://github.com/h-aboutalebi/DeepfakeArt/blob/main/images/logo.jpg" alt="logo" width="40%" height="40%"> Part of <img src="https://github.com/h-aboutalebi/DeepfakeArt/blob/main/images/genai4good.png" alt="genai4good" width="20%" height="20%"> <figure class="image"> <img src="https://github.com/h-aboutalebi/DeepfakeArt/blob/main/images/all.jpg" alt="inpainting"> </figure>

DeepfakeArt Challenge Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection

The tremendous recent advances in generative artificial intelligence techniques have led to significant successes and promise in a wide range of different applications ranging from conversational agents and textual content generation to voice and visual synthesis. Amid the rise in generative AI and its increasing widespread adoption, there has been significant growing concern over the use of generative AI for malicious purposes. In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery (e.g., generation of images containing or derived from copyright content), and data poisoning (i.e., generation of adversarially contaminated images). Motivated to address these key concerns to encourage responsible generative AI, we introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection. Comprising of over 32,000 records across a variety of generative forgery and data poisoning techniques, each entry consists of a pair of images that are either forgeries / adversarially contaminated or not. Each of the generated images in the DeepfakeArt Challenge benchmark dataset has been quality checked in a comprehensive manner. The DeepfakeArt Challenge is a core part of GenAI4Good, a global open source initiative for accelerating machine learning for promoting responsible creation and deployment of generative AI for good.

The generative forgery and data poisoning methods leveraged in the DeepfakeArt Challenge benchmark dataset include:

Team Members:

Partners:

<figure class="image"> <img src="https://github.com/h-aboutalebi/DeepfakeArt/blob/main/images/viplogo-1.png"> </figure>

The DeepfakeArt Challenge benchmark dataset is available here

The DeepfakeArt Challenge paper is available here

Data distribution

The DeepfakeArt Challenge benchmark dataset, as proposed, encompasses over 32,000 records, incorporating a wide spectrum of generative forgery and data poisoning techniques. Each record is represented by a pair of images, which could be either forgeries, adversarially compromised, or not. Fig. 2 (a) depicts the overall distribution of data, differentiating between forgery/adversarially contaminated records and untainted ones. The dispersion of data across various generative forgery and data poisoning techniques is demonstrated in Fig. 2 (b). Notably, as presented Fig. in 2 (a), the dataset contains almost double the number of dissimilar pairs compared to similar pairs, making the identification of similar pairs substantially more challenging given that two-thirds of the dataset comprises dissimilar pairs.

<img src="https://github.com/h-aboutalebi/DeepfakeArt/blob/main/images/dist.png" alt="distribution of data">

Inpainting Category

<figure class="image"> <img src="https://github.com/h-aboutalebi/DeepfakeArt/blob/main/images/inpainting.jpg" alt="inpainting"> <figcaption>Generated forgery images via inpainting.</figcaption> </figure>

The source dataset for the inpainting category is WikiArt (ref). Each image is sampled randomly from the dataset as the source image to generate forgery images. Each record in this category consists of three images:

The prompt used for the generation of the inpainting image is: "generate a painting compatible with the rest of the image"

This category consists of more than 5063 records. The original images are masked between 40%-60%. We applied one of the followed masking schema randomly:

The code for the data generation in this category can be found here

Style Transfer Category

<figure class="image"> <img src="https://github.com/h-aboutalebi/DeepfakeArt/blob/main/images/style.jpg"> <figcaption>Generated forgery images via style transfer.</figcaption> </figure>

The source dataset for the style transfer category is WikiArt (ref). Each record in this category consists of two images:

Guided by Canny edge detections and prompts, we selected 200 images from each sub-directory of the WikiArt dataset, except for Action_painting and Analytical_Cubism, which contain only 98 and 110 images, respectively. We utilized four distinct prompts for different styles for the generations:

Each prompt was used for a quarter of the images from each sub-directory.

This category consists of more than 3,074 records.

The code for the data generation of this category can be found here

Adversarial Data Poisoning Category

Images in this category are adversarially data poisoned images that were generated using RobustBench (ref) and torchattack (ref) libraries on WikiArt dataset. Here we used a ResNet50 model trained on ImageNet against:

The dataset used here is WikiArt. For each source image, we have reported 3 attacks results. The images are also center cropped to make it harder for detection.

Each record in this category consists of two images:

This category consists of more than 2,730 records.

The code for this category can be found (here).

Cutmix Category

This method is initially proposed in the paper of "Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models" (ref).

In this section, images were generated using the "Cutmix" technique, which involves selecting a square patch of pixels from a source image and overlaying it onto a target image. Both the source and target images were randomly chosen from the WikiArt dataset. The patch size, source image extraction location, and target image overlay location were all determined randomly. This section contains 2,000 records.

The code for this section can be found (here).

Citation

Aboutalebi, Hossein, Daniel Mao, Carol Xu, and Alexander Wong. "DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection." arXiv preprint arXiv:2306.01272 (2023)