Home

Awesome

This is the official repository of the paper: "IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-conditioned Diffusion Models" (accepted at ICCV 2023)

arXiv ICCV2023 PWC PWC PWC PWC

$${\color{red}Updates}$$

The sampling instructions have been updated. check the sampling section

The submitted pretrained FR model to the SDFR competition can be downloaded under IGD-IDiff-Face

This work is the result of the Master thesis by Jonas Henry Grebe.

<img align="right" src="etc/idiff-face-overview.png" width=40%>

The availability of large-scale authentic face databases has been crucial to the significant advances made in face recognition research over the past decade. However, recent legal and ethical concerns led to the retraction of many of these databases by their creators, raising questions about the continuity of future face recognition research without one of its key resources. Synthetic face datasets have emerged as a promising alternative to privacy-sensitive authentic data for face recognition development. However, recent synthetic datasets that are used to train face recognition models suffer either from limitations in intra-class diversity or cross-class (identity) discrimination, leading to less optimal verification accuracies, far away from the accuracies achieved by models trained on authentic data. This paper targets this issue by proposing IDiff-Face, a novel approach based on conditional latent diffusion models for synthetic identity generation with realistic identity variations for face recognition training. Through extensive evaluations, our proposed synthetic-based face recognition approach pushed the limits of state-of-the-art performances, achieving, for example, 98.00% accuracy on the LFW benchmark, far ahead from the recent synthetic-based face recognition solutions with 95.40% and closing the gap to authentic-based face recognition with 99.82% accuracy.

Datasets and pretrained models

Please share your name, affiliation, and official email in the request form

Download links for the pre-trained IDiff-Face diffusion model weights:

Download links for the pre-generated synthetic 10K identities x 50 images datasets from the paper:

Download links for the pre-trained face recognition models using synthetic IDiff-Face generated data:

Download for the setup

Download links for the pre-trained autoencoder weights that originally come from the fhq256 LDM from Rombach et al. and strictly follow their licence. Their VQModelInterface submodule has been manually extracted and split into its encoder and decoder models, since the encoder is only used during training and the decoder is only needed for sampling:

The resulting .pt files are then expected to be saved under models/autoencoder/first_stage_encoder_state_dict.pt and models/autoencoder/first_stage_decoder_state_dict.pt, respectively.

Results

The following table shows the verification benchmark performances achieved by methods from related work in comparison to the ones of our proposed IDiff-Face approach. They have been reported by the respective authors themselves. For face recognition training on our synthetic (uniform) data, we used a ResNet-50 with an CosFace loss, which is exactly the same setup that has been used in the SFace and USynthFace works. More detailed results are presented in the paper. <img align="center" src="etc/results.png" width=90%>


Sample images

<img align="center" src="etc/synthetic_variations.png" width=90%>

How to use the code?

It includes the main scripts used for training and evaluating the IDiff-Face models. All experiments of this project were conducted within a Docker container, whose Dockerfile is included in this archive. However, the scripts are itself not depending on Docker and thus the commands provided in this README to run the scripts are kept basic and thus might have to be slightly altered to match the specific environment of the user. You also might have to alter the root paths under configs/paths/gpuc_cluster.yaml.

Setup

Download the FFHQ dataset (128x128) and put the 70.000 unlabelled images under data/ffhq_128/. The training embeddings used as contexts during training are NOT provided under data/embeddings_elasticface_128.npy and can be extracted using the extract_face_embeddings_from_dir.py script. For that, the pre-trained ElasticFace-Arc model weights have to be downloaded from the official ElasticFace repository and placed under utils/Elastic_R100_295672backbone.pth. The pre-trained autoencoder for the latent diffusion training is obtained from the pre-trained fhq256 LDM from Rombach et al. please follow their licence distribution. For more information, see the downloads section above.


Training a IDiff-Face model

In order to train the model with 25% CPD make sure that the option model: unet_cond_ca_cpd25 is set in the configs/train_config.yaml. The CPD probability can be changed by creating a new model specification in the configs/model/ subconfiguration folder. In addition to that, it has to be ensured that the dataset: ffhq_folder option is set and that the paths in the corresponding subconfiguration configs/dataset/ffhq_folder.yaml are pointing to the training images and pre-extracted embeddings. The model training can be initiated by executing:

python main.py
 

Naming the trained model: After the model is trained, the model output directory content under outputs/DATE/TIME/ can be copied to another folder e.g. trained_models/unet-cond-ca-bs512-150K-cpd25/. The name of this new folder is now referred to as the MODEL_NAME of the trained model.


Sampling with a (pre-trained) IDiff-Face model

<span style="color: red">***** Update ***** </span>

<span style="color: red"> The requirements (requirements_sampling.txt) is exported from pip list and added to the project folder. </span>

For reproducibility and consistency, the synthetic contexts are NOT generated on-the-fly during sampling. Instead, they are pre-generated and saved in .npy files, which contain Python dicts with identity_names/dummy_names as keys and the associated context vector as value. This is the same structure used for the training embeddings. In this archive, some pre-generated two-stage contexts are already included. In order to generate samples with synthetic_uniform contexts, quickly execute the create_sample_identity_contexts.py script, which will pre-compute 15.000 synthetic uniform contexts that you can use for sampling. Then, specify the path to the trained model and the contexts file that shall be used for sampling in the sample_config.yaml. There you can also configure the number of identities to use from the provided contexts file and the number of images per identity context. Those samples will be saved under samples/MODEL_NAME/CONTEXT_NAME as identity blocks, e.g. a 4x4 grid block of 128x128 images (total block size is then 512x512). These blocks can then be splitted using e.g. then split_identity_blocks.py script. But before doing that, they have to be aligned. The sampling script can be started via:

python create_sample_identity_contexts.py
python sample.py
 

Aligning the samples: Aligning the images using MTCNN detection and ArcFace alignment is simply done by executing the align.py script after having specified every data that shall be aligned in the align_config.yaml. Currently, when alignment for one image per identity fails, the entire identity block is instead just resized to 112x112 instead of proper alignment. This option can be disabled by setting just_resize_if_fail: False in the config. Then, the entire block will be discarded instead. For the generation of 10.000 identities with 50 samples each, 10.050 identities were initially sampled from 15.000 pre-generated contexts to account for future alignment failures and thereby make sure that at least 10.000 identities with 50 aligned images are available for the large-scale training.

python align.py

Splitting the blocks: Just execute the split_identity_blocks.py script after ensuring that the paths are correct. The script is very straightforward and easy to modify if any issues should occur.

python split_identity_blocks.py

Training a Face Recognition (FR) model

The dependencies for training the FR model are different. We used the training setup of USynthFace and are thus referring to that for the dependencies. With the code provided under face_recognition_training/, the training of a CosFace FR model with the configuration file under face_recognition_training/config/config.py that can be changed should be started via:

./training_large_scale_with_augment.sh

More information on remaining folders and scripts:

Directories:

Main scripts:

Citation

If you use IDiff-Face or any codes in this repository, please cite the following paper:

@inproceedings{Boutros2023IDiffFace,
    author    = {Fadi Boutros and Jonas Henry Grebe  and Arjan Kuijper and Naser Damer},
    title     = {IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-conditioned Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023}
}

Reference repositories

License

This project is licensed under the terms of the Attribution-NonCommercial-ShareAlike 4.0 
International (CC BY-NC-SA 4.0) license. 
Copyright (c) 2023 Fraunhofer Institute for Computer Graphics Research IGD Darmstadt