Awesome

Localized Audio Visual DeepFake Dataset (LAV-DF)

This repo is the official PyTorch implementation for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization accepted by CVIU.

LAV-DF Dataset

Download

To use this LAV-DF dataset, you should agree the terms and conditions.

Download link: OneDrive, Google Drive, HuggingFace.

Baseline Benchmark

Method	AP@0.5	AP@0.75	AP@0.95	AR@100	AR@50	AR@20	AR@10
BA-TFD	79.15	38.57	00.24	67.03	64.18	60.89	58.51
BA-TFD+	96.30	84.96	04.44	81.62	80.48	79.40	78.75

Please note this result of BA-TFD is slightly better than the one reported in the paper. This is because we have used the better hyperparameters in this repository.

Baseline Models

Requirements

The main versions are,

Python >= 3.7, < 3.11
PyTorch >= 1.13
torchvision >= 0.14
pytorch_lightning == 1.7.*

Run the following command to install the required packages.

pip install -r requirements.txt

Training BA-TFD

Train the BA-TFD introduced in paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization with default hyperparameter on LAV-DF dataset.

python train.py \
  --config ./config/batfd_default.toml \
  --data_root <DATASET_PATH> \
  --batch_size 4 --num_workers 8 --gpus 1 --precision 16

The checkpoint will be saved in ckpt directory, and the tensorboard log will be saved in lighntning_logs directory.

Training BA-TFD+

Train the BA-TFD+ introduced in paper Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization with default hyperparameter on LAV-DF dataset.

python train.py \
  --config ./config/batfd_plus_default.toml \
  --data_root <DATASET_PATH> \
  --batch_size 4 --num_workers 8 --gpus 2 --precision 32

Please use FP32 for training BA-TFD+ as FP16 will cause inf and nan.

The checkpoint will be saved in ckpt directory, and the tensorboard log will be saved in lighntning_logs directory.

Evaluation

Please run the following command to evaluate the model with the checkpoint saved in ckpt directory.

Besides, you can also download the BA-TFD and BA-TFD+ pretrained models.

python evaluate.py \
  --config <CONFIG_PATH> \
  --data_root <DATASET_PATH> \
  --checkpoint <CHECKPOINT_PATH> \
  --batch_size 1 --num_workers 4

In the script, there will be a temporal inference results generated in output directory, and the AP and AR scores will be printed in the console.

Note please make sure only one GPU is visible to the evaluation script.

License

This project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

If you find this work useful in your research, please cite them.

The conference paper,

@inproceedings{cai2022you,
  title = {Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization},
  author = {Cai, Zhixi and Stefanov, Kalin and Dhall, Abhinav and Hayat, Munawar},
  booktitle = {2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)},
  year = {2022},
  doi = {10.1109/DICTA56598.2022.10034605},
  pages = {1--10},
  address = {Sydney, Australia},
}

The extended journal version is accepted by CVIU,

@article{cai2023glitch,
  title = {Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization},
  author = {Cai, Zhixi and Ghosh, Shreya and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin and Hayat, Munawar},
  journal = {Computer Vision and Image Understanding},
  year = {2023},
  volume = {236},
  pages = {103818},
  issn = {1077-3142},
  doi = {10.1016/j.cviu.2023.103818},
}

Acknowledgements

Some code related to boundary matching mechanism is borrowed from JJBOY/BMN-Boundary-Matching-Network and xxcheng0708/BSNPlusPlus-boundary-sensitive-network.