Awesome

<h2>WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection</h2> <p align="center"> <img src="./fakemask.jpg" width="720" height="435px" alt="Deepfake in the Wild" title="Deepfake in the Wild" align="center"></img> </p>

📌 Dataset Description

Existing deepfake datasets like DeepfakeDetection and FaceForensics++ have advanced detection research but are limited by constrained real videos featuring a few actors and fake videos generated using popular software. As a result, detectors trained on these datasets often struggle with the diversity of real-world deepfakes found online.

To address this, we introduce WildDeepfake, a dataset of 7,314 face sequences from 707 deepfake videos sourced entirely from the internet. Despite its small size, WildDeepfake better represents the challenges of real-world detection, where baseline detectors show significantly reduced performance.

To enhance detection, we also propose Attention-based Deepfake Detection Networks (ADDNets), utilizing 2D and 3D attention mechanisms to improve focus on real/fake facial features.

📂 Dataset Contents

A comparision to previous datasets (before our work)

Dataset name	Download	Generate method	Deepfake videos	Actors
Deepfake-TIMIT low	download	Deepfake	320	32
Deepfake-TIMIT high	download	Deepfake	320	32
Faceforensics	-	Deepfake	1000	977
Faceforensics++	download	Deepfake	1000	977
Deepfake detection	download	Deepfake	over3000	28
Celeb-deepfakeforensics v1	download	Deepfake	795	13
Celeb-deepfakeforensics v2	download	Deepfake	590	59
DFDC	download	Deepfake	-	-
WildDeepfake	download	Internet	707	Unknown

File Structure:

deepfake_in_the_wild
                    |--real train
                                 |--0.tar.gz
                                 |--1.tar.gz
                                 |--2.tar.gz
                                 ...
                    |--real test
                                |--0.tar.gz
                                |--1.tar.gz
                                |--2.tar.gz
                                ...
                    |--fake train
                                 |--0.tar.gz
                                 |--1.tar.gz
                                 |--2.tar.gz
                                 ...
                    |--fake test
                                |--0.tar.gz
                                |--1.tar.gz
                                |--2.tar.gz
                                ...

In each tar.gz file, there will be several folders containing face images, and the images in each folder represent a face sequence. The image name in the folder represents the frame number it appears in the original video.

⬇️ Request for Download

You will need to fill an agreement form to use the dataset, which is now avalibble on Hugging Face click to download.

📜 Cite Us

If you use this dataset in your research, please cite it as follows:

@inproceedings{zi2020wilddeepfake,
  title={Wilddeepfake: A challenging real-world dataset for deepfake detection},
  author={Zi, Bojia and Chang, Minghao and Chen, Jingjing and Ma, Xingjun and Jiang, Yu-Gang},
  booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
  pages={2382--2390},
  year={2020}
}

📝 Privacy Statement

To ensure the privacy of individuals featured in the dataset, we have implemented the following measures:

Restricted Use: The dataset is strictly for research purposes, and only face sequences are released, not full videos.
Privacy Protection in Publications: Key facial features are obscured in all visual materials, including papers and presentations. Additionally, strict access controls are in place.
Applicant Verification: Access is granted only after verifying the applicant’s academic email address, personal electronic signature, and other necessary credentials.
Usage Agreement: Applicants are required to sign a comprehensive agreement to ensure the dataset is used exclusively for research purposes.
Right to Removal: If any part of the dataset impacts you, please contact us to request its removal.

We are committed to safeguarding privacy while enabling research advancements.