Home

Awesome

DeCLIP

arXiv

Official PyTorch Implementation of the Paper:

Ștefan Smeu, Elisabeta Oneață, Dan Oneață
DeCLIP: Decoding CLIP Representations for Deepfake Localization
WACV, 2025

Data

To set up your data, follow these steps:

  1. Download the datasets:

  2. Organize the data:

    After downloading, place the datasets in the datasets folder to match the following structure:

    ├── data/
    ├── datasets/
    │   ├── AutoSplice/
    │   ├── dolos_data/
    │   │   ├── celebahq/
    │   │   │   ├── fake/
    │   │   │   │   ├── lama/
    │   │   │   │   ├── ldm/
    │   │   │   │   ├── pluralistic/
    │   │   │   │   ├── repaint-p2-9k/
    │   │   │   ├── real/
    │   │   ├── ffhq/
    ├── models/
    ├── train.py
    ├── validate.py
    ├── ...
    

Installation

Main prerequisites:

Train

To train the models mentioned in the article, follow these steps:

  1. Set up training and validation data paths in options/train_options.py or specify them as arguments when running the training routine.

  2. Run the training command using the following template:

python train.py --name=<experiment_name> --train_dataset=<dataset> --arch=<architecture> --decoder_type=<decoder> --feature_layer=<layer> --fix_backbone --fully_supervised

Example commands:

Train on Repaint-P2:

python train.py --name=test_repaint --train_dataset=repaint-p2-9k --data_root_path=datasets/dolos_data/celebahq/ --arch=CLIP:ViT-L/14 --decoder_type=conv-20 --feature_layer=layer20 --fix_backbone --fully_supervised

Where:

Exceptions:

Pretrained Models

We provide trained models for the networks which rely on ViT and ViT+RN50 backbones listed in the table below.

BackboneFeature LayerDecoderTraining DatasetDownload Link
ViTlayer20conv-20PluralisticDownload
ViTlayer20conv-20LaMaDownload
ViTlayer20conv-20RePaint-p2-9kDownload
ViTlayer20conv-20LDMDownload
ViTlayer20conv-20COCO-SDDownload
ViT+RN50layer20+layer3conv-20PluralisticDownload
ViT+RN50layer20+layer3conv-20LaMaDownload
ViT+RN50layer20+layer3conv-20RePaint-p2-9kDownload
ViT+RN50layer20+layer3conv-20LDMDownload

Additionally, one can download the checkpoints using gsutil from this GCS bucket. The weights are located in backbone_VIT and backbone_VIT+RN50 folders, where each checkpoints follows the naming convention: <backbone>_<feature_layer>_<decoder>_<training_dataset>, where training_dataset is lower-cased. For the case of features concatenated from ViT and RN50, a + charachter joins the 2 backbones and feature layers.

Evaluation

To evaluate a model, use the following template:

python validate.py --arch=CLIP:ViT-L/14 --ckpt=path/to/the/saved/mode/checkpoint/model_epoch_best.pth --result_folder=path/to/save/the/results --fully_supervised

License

<p xmlns:cc="http://creativecommons.org/ns#">The code is licensed under <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">CC BY-NC-SA 4.0 <img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" alt=""><img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" alt=""><img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/nc.svg?ref=chooser-v1" alt=""><img style="height:22px!important;margin-left:3px;vertical-align:text-bottom;" src="https://mirrors.creativecommons.org/presskit/icons/sa.svg?ref=chooser-v1" alt=""></a></p>

This repository also integrates code from the following repositories:

@inproceedings{ojha2023fakedetect,
      title={Towards Universal Fake Image Detectors that Generalize Across Generative Models}, 
      author={Ojha, Utkarsh and Li, Yuheng and Lee, Yong Jae},
      booktitle={CVPR},
      year={2023},
}
@inproceedings{patchforensics,
  title={What makes fake images detectable? Understanding properties that generalize},
  author={Chai, Lucy and Bau, David and Lim, Ser-Nam and Isola, Phillip},
  booktitle={European Conference on Computer Vision},
  year={2020}
 }

Citation

If you find this work useful in your research, please cite it.

@InProceedings{DeCLIP,
    author    = {Smeu, Stefan and Oneata, Elisabeta and Oneata, Dan},
    title     = {DeCLIP: Decoding CLIP representations for deepfake localization},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year      = {2025}
}