Home

Awesome

News

2023.9.26: Along with the work Visual Realism Assessment for Face-swap Videos, we release the DFGC-VRA benchmark dataset under the DFGC-2022 dataset folder. Please sign the forms below to apply this data.

Overview

The DFGC-2022 dataset originates from the Second DeepFake Game Competition held with IJCB-2022. This competition provides a common platform for benchmarking the game between the current state-of-the-arts in DeepFake creation and detection methods. The overview of the competition and the dataset can be seen in the following figures and tables. We refer to the competition summary paper for more details. This dataset has 4394 video clips in total, among which 2799 are face-swap DeepFake videos created with various methods, post-processing, and compressions. Researchers in both face-swap and DeepFake detection may find this dataset useful.

image1 <img src="./figs/dataset.png" alt="image2" style="zoom:50%;" />

image3

Application

The dataset is only for non-profit research usage. Applicants need to agree the terms and sign in the application form (Google Form or Tencent Form), and cite the following research in order to use this dataset.

@inproceedings{peng2022dfgc,
title={DFGC 2022: The Second DeepFake Game Competition},
author={Peng, Bo and Xiang, Wei and Jiang, Yue and Wang, Wei and Dong, Jing and Sun, Zhenan and Lei, Zhen and Lyu, Siwei},
booktitle={2022 IEEE International Joint Conference on Biometrics (IJCB)},
pages={1--10},
year={2022},
organization={IEEE}
}

The Creation Dataset

It has two sub-folders, Resource Clips that include all original clips which are used as resources for creating face-swaps, and Result Clips that include all face-swap submissions in the three evaluation rounds of DFGC-2022 creation track. The Resource Clips sub-folder has the following file structure: ./GroupIndex/ID-Index/ClipIndex.mp4, and it has 20 groups that each containing 2 identities to be mutually face-swapped. The total number of resource clips is 505.

Resource Clips  
│
└───1 (Group Index)
│   │
│   └───1 (ID Index)
│   │   │   1.mp4 (Clip Name)
│   │   │   2.mp4
│   │   │   ...
│   │
│   └───2
│       │   1.mp4
│       │   2.mp4
│       │   ...
│   
└─── ...

The Result Clips sub-folder has the following structure. It has 3 rounds each containing several face-swap submissions from the competition participants or from the organizer baseline. The three txt files metadata_C1.txt, metadata_C2.txt, metadata_C3.txt each defines 80 face-swap clips to be made in each submission round. For example, the "1/1/3" means the 3.mp4 of ID-1 in Group-1 needs to be face-swapped (with the ID-2 of Group-1 as the ID donor).

Result Clips  
│   metadata_C1.txt
│   metadata_C2.txt
│   metadata_C3.txt
│   Method Descriptions.xlsx
│
└───Round1 (Round Name)
│   │
│   └───organizer-baseline (Submission Name)
│   │   │
│   │   └───1 (Group Index)
│   │   │   │
│   │   │   └───1 (ID Index)
│   │   │   │   │   3.mp4 (Clip Name)
│   │   │   │   │   7.mp4
│   │   │   │
│   │   │   └───2
│   │   │       │   2.mp4
│   │   │       │   4.mp4
│   │   │
│   │   └─── ...
│   │
│   └───JoyFang-2
│   │   └─── ...
│   └─── ...
│   
└───Round2
│   └─── ...
│
└───Round3
    └─── ...

The Method Descriptions.xlsx file contains descriptions of face-swap methods used to create each submission. The evaluation results for these face-swap submissions can be seen at https://codalab.lisn.upsaclay.fr/competitions/2149#learn_the_details-evaluation. For example, the 3rd round evaluation results are as follows:

submissionrealismmouthqualityexpressionIDanti-detectionsum
JoyFang (#7)4.044.0954.163.893.65.0424.825
chinatelecom_cv (#4)3.794.054.153.913.0655.10624.071
JoyFang (#8)3.974.083.9953.9053.614.44124.001
chinatelecom_cv (#5)3.5553.9254.0753.8152.964.89523.225
felixrosberg (#3)3.5553.943.8253.783.154.09322.343
felixrosberg (#2)3.483.9653.853.7153.1854.09222.287
organizer-baseline3.794.0154.033.883.842.62522.18
Lio (#3)3.9154.073.793.883.7152.76222.132
Taeko (#2)3.723.9453.983.7353.4353.31322.128
wany (#2)3.9554.0553.73.883.6252.69521.91
Lio (#2)3.874.093.9053.8653.7352.38621.851
wany (#1)3.964.0853.9053.8853.622.32921.784
jiachenlei (#8)2.63.323.2453.0852.6252.63317.508
winterfell (#4)3.173.6253.243.52.9450.76717.247
jiachenlei (#7)2.513.2953.543.032.612.09517.08
winterfell (#5)1.8052.883.3852.672.1950.69513.63

The dataset users can compare their face-swap methods with DFGC-2022 participants' methods using this dataset, using your preferred evaluation metrics and evaluation methods.

The Detection Dataset

The dataset contains two subsets used in the DFGC-2022 competition: the Public set that contains 2228 real and fake video clips, and the Private set that contains 2166 video clips. The two sets have disjoint IDs. Here we only release the Private-1 part (described in the competition summary paper), since the Private-2 has copyright restrictions. The ground-truth labels are in Public Label.json and Private Label.json, where the label 0 represents real and 1 represents fake.

Detection Dataset  
│   Public Label.json
│   Private Label.json
│   Detection Dataset Meta.csv
│
└───Public
│   │   aaflfhiktb.mp4
│   │   aapifkmrli.mp4
│   │   abbefwjpxv.mp4
│   │   ...
│   
└───Private
    │   ...

The file Detection Dataset Meta.csv has detailed meta-information for each clip. This csv file is better viewed in a plain text viewer, and Microsoft Excel is problematic showing some fields as date format. The meta-information includes "videoName", "label", "split", "target", "compression", "face-swap method". The "target" refers to the original resource video (groupIndex/idIndex/clipIndex) that the fake video originates from, or the groupIndex/idIndex that the real video originates from. Here, a real video may not correspond exactly to a resource video clip, hence we only narrow down to the idIndex. The "compression" refers to the quality of ffmpeg compression we conduct on submitted videos: "c40", "c23", "none" are respectively heavy compression, light compression and no compression. The "face-swap method" only applies to fake videos that refers to the submission identifier from the creation track. Refer to the Result Clips in the creation dataset.

The dataset users can use this dataset to train or test their DeepFake detection models or methods. The DFGC-2022 detection track top-3 performances are as follows:

TeamPublicPrivate-1Private-2PrivateRank
HanChen0.95210.91780.89550.90851
OPDAI0.92970.88360.85110.86722
guanjz0.94830.9110.74610.86703

Related Projects

Code for the "Visual Realism Assessment for Face-swap Videos": https://github.com/XianyunSun/VRA

DFGC-2022 detection track first place solution: https://github.com/chenhanch/DFGC-2022-1st-place

DFGC-2021 starter-kit and released dataset: https://github.com/bomb2peng/DFGC_starterkit