Home

Awesome

AV-Deepfake1M

<div align="center"> <img src="assets/teaser.png"> <p></p> </div> <div align="center"> <a href="https://github.com/ControlNet/AV-Deepfake1M/issues"> <img src="https://img.shields.io/github/issues/ControlNet/AV-Deepfake1M?style=flat-square"> </a> <a href="https://github.com/ControlNet/AV-Deepfake1M/network/members"> <img src="https://img.shields.io/github/forks/ControlNet/AV-Deepfake1M?style=flat-square"> </a> <a href="https://github.com/ControlNet/AV-Deepfake1M/stargazers"> <img src="https://img.shields.io/github/stars/ControlNet/AV-Deepfake1M?style=flat-square"> </a> <a href="https://pypi.org/project/avdeepfake1m/"><img src="https://img.shields.io/pypi/v/avdeepfake1m?style=flat-square"></a> <a href="https://pypi.org/project/avdeepfake1m/"><img src="https://img.shields.io/pypi/dm/avdeepfake1m?style=flat-square"></a> <a href="https://github.com/ControlNet/AV-Deepfake1M/blob/master/LICENSE"> <img src="https://img.shields.io/badge/license-CC%20BY--NC%204.0-97ca00?style=flat-square"> </a> <a href="https://arxiv.org/abs/2311.15308"> <img src="https://img.shields.io/badge/arXiv-2311.15308-b31b1b.svg?style=flat-square"> </a> </div>

This is the official repository for the paper AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset (Best Award).

Abstract

The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In this research, we emulate the process of such content generation and propose the AV-Deepfake1M dataset. The dataset contains content-driven (i) video manipulations, (ii) audio manipulations, and (iii) audio-visual manipulations for more than 2K subjects resulting in a total of more than 1M videos. The paper provides a thorough description of the proposed data generation pipeline accompanied by a rigorous analysis of the quality of the generated data. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets. The proposed dataset will play a vital role in building the next-generation deepfake localization methods.

Dataset

Download

We're hosting 1M-Deepfakes Detection Challenge at ACM MM 2024.

Baseline Benchmark

MethodAP@0.5AP@0.75AP@0.9AP@0.95AR@50AR@20AR@10AR@5
PyAnnote00.0300.0000.0000.0000.6700.6700.6700.67
Meso409.8606.0502.2200.5938.9238.8136.4726.91
MesoInception408.5005.1601.8900.5039.2739.0035.7824.59
EfficientViT14.7102.4200.1300.0127.0426.4323.9020.31
TriDet + VideoMAEv221.6705.8300.5400.0620.2720.1219.5018.18
TriDet + InternVideo29.6609.0200.7900.0924.0823.9623.5022.55
ActionFormer + VideoMAEv220.2405.7300.5700.0719.9719.8119.1117.80
ActionFormer + InternVideo36.0812.0101.2300.1627.1127.0026.6025.80
BA-TFD37.3706.3400.1900.0245.5535.9530.6626.82
BA-TFD+44.4213.6400.4800.0348.8640.3734.6729.88
UMMAFormer51.6428.0707.6501.5844.0743.4542.0940.27

Metadata Structure

The metadata is a json file for each subset (train, val), which is a list of dictionaries. The fields in the dictionary are as follows.

SDK

We provide a Python library avdeepfake1m to load the dataset and evaluation.

Installation

pip install avdeepfake1m

Usage

Prepare the dataset as follows.

|- train_metadata.json
|- train_metadata
|  |- ...
|- train
|  |- ...
|- val_metadata.json
|- val_metadata
|  |- ...
|- val
|  |- ...
|- test_files.txt
|- test

Load the dataset.

from avdeepfake1m.loader import AVDeepfake1mDataModule

# access to Lightning DataModule
dm = AVDeepfake1mDataModule("/path/to/dataset")

Evaluate the predictions. Firstly prepare the predictions as described in the details. Then run the following code.

from avdeepfake1m.evaluation import ap_ar_1d, auc
print(ap_ar_1d("<PREDICTION_JSON>", "<METADATA_JSON>", "file", "fake_segments", 1, [0.5, 0.75, 0.9, 0.95], [50, 30, 20, 10, 5], [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))
print(auc("<PREDICTION_TXT>", "<METADATA_JSON>"))

License

The dataset is under the EULA. You need to agree and sign the EULA to access the dataset.

The other parts of this project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

If you find this work useful in your research, please cite it.

@inproceedings{cai2024av,
  title={AV-Deepfake1M: A large-scale LLM-driven audio-visual deepfake dataset},
  author={Cai, Zhixi and Ghosh, Shreya and Adatia, Aman Pankaj and Hayat, Munawar and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={7414--7423},
  year={2024},
  doi={10.1145/3664647.3680795}
}