Awesome

Voice-Face Homogeneity Tells Deepfake

Code for 'Voice-Face Homogeneity Tells Deepfake' [[Arxiv]][https://arxiv.org/abs/2203.02195], which is designed to detect deepfake images via the matching view of voices and faces.

Data Preparation

Download the DFDC , DF-TIMIT, or FakeAVCeleb Datasets.
Extract the frames and audio from the videos, and store them in the format as described in ./lists/[Dataset]/train_frame.txt. For instance, the frames and corresponding audios can be stored as:

/data/FakeAVCeleb/test/face/RealVideo-RealAudio/African/women/id04245/00001.jpg 0

and

/data/FakeAVCeleb/test/voice/RealVideo-RealAudio/African/women/id04245/00001.wav 0

The first item is the path of image/audio, the second item is the label (real for 0, and fake for 1/2/3)

The other datasets, e.g., DFDC, can also be formatted.

Quick Start

Download the pre-trained model from:

DFDC: link

FakeAVCeleb: link

and put them into ./exp/[Dataset]
Run:

python test_vfd.py --config ./configs/DFDC/test.yaml

python test_vfd.py --config ./configs/FakeAVCeleb/test.yaml

Citation

Kindly cite us if you find this paper helps :)

@article{VFD,
author = {Cheng, Harry and Guo, Yangyang and Wang, Tianyi and Li, Qi and Chang, Xiaojun and Nie, Liqiang},
title = {Voice-Face Homogeneity Tells Deepfake},
year = {2023},
publisher = {ACM},
volume = {20},
number = {3},
doi = {10.1145/3625231},
journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
}