Awesome
Official implementation for TALL and TALL++
[ICCV-2023] Thumbnail Layout for Deepfake Video Detection
[IJCV-2024] Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection
- 2024.3.8 :tada:The improved version TALL++ has been accepted by IJCV2024!
- 2024.3.7 Updated the basic data preparation code, which is sourced from FaceForensic.
- 2024.2.18 There is a small error in the version released by ICCV about appendix. We have added the appendix to the text. A revised version of the paper can be found on arXiv.
Attention: The code for our improved IJCV extension (TALL++, https://arxiv.org/pdf/2403.10261.pdf ) will be made available in this repository.
Our implementation is based on Swin-Transformer.
Requirements
- einops
- fvcore
- timm==0.4.12
- torch==1.13.1
- torchaudio==0.13.1
- torchvision==0.14.1
Data Preparation
Please refer to FaceForensic for how to prepare deepfake datasets such as FF++, Celeb-DF, and DFDC.
The data loader can load image sequences stored in txt files in the following format:
#example for train.txt
# path | start frame | end frame | label
original_faces_c23/928 1 300 0
original_faces_c23/712 1 300 0
original_faces_c23/582 1 300 0
original_faces_c23/602 1 300 0
deepfakes_faces_c23/143_140 1 300 1
deepfakes_faces_c23/408_424 1 300 1
deepfakes_faces_c23/766_761 1 300 1
deepfakes_faces_c23/964_174 1 300 1
Training:
[IMPORTANT] Edit main.py and change the default arg-parser values according to your convenience (especially the config paths)
CUDA_VISIBLE_DEVICES=0 python main.py --dataset ffpp \
--input-size 112 --num_clips 8 --output_dir [your_output_dir] --opt adamw --lr 1.5e-5 --warmup-lr 1.5e-8 --min-lr 1.5e-7 \
--epochs 60 --sched cosine --duration 4 --batch-size 4 --thumbnail_rows 2 --disable_scaleup --cutout True \
--pretrained --warmup-epochs 10 --no-amp --model TALL_SWIN \
--hpe_to_token 2>&1 | tee ./output/train_ffpp_`date +'%m_%d-%H_%M'`.log
Evaluation:
CUDA_VISIBLE_DEVICES=0 python test.py --dataset ffpp \
--input_size 112 --opt adamw --lr 1e-4 --epochs 30 --sched cosine --duration 4 --batch-size 4 --thumbnail_rows 2 --disable_scaleup \
--pretrained --warmup-epochs 5 --no-amp --model TALL_SWIN \
--hpe_to_token --initial_checkpoint [model_checkpoint] --eval --num_crops 1 --num_clips 8 \
2>&1 | tee ./output/test_ffpp_`date +'%m_%d-%H_%M'`.log
Citation
@inproceedings{xu2023tall,
title={TALL: Thumbnail Layout for Deepfake Video Detection},
author={Xu, Yuting and Liang, Jian and Jia, Gengyun and Yang, Ziming and Zhang, Yanhao and He, Ran},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={22658--22668},
year={2023}
}