Home

Awesome

The model is now available HERE

(Requires to sign End User License Agreement)

Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks

ArXiv, Supplementary Video

Baris Gecer<sup> 1,2</sup>, Alexander Lattas<sup> 1,2</sup>, Stylianos Ploumpis<sup> 1,2</sup>, Jiankang Deng<sup> 1,2</sup>, Athanasios Papaioannou<sup> 1,2</sup>, Stylianos Moschoglou<sup> 1,2</sup>, & Stefanos Zafeiriou<sup> 1,2</sup> <br/> <sup>1 </sup>Imperial College London <br/> <sup>2 </sup>FaceSoft.io

This repo provides Tensorflow implementation of above paper for training

Abstract

<p align="center"><img width="100%" src="representative.png" /></p>

Generating realistic 3D faces is of high importance for computer graphics and computer vision applications. Generally, research on 3D face generation revolves around linear statistical models of the facial surface. Nevertheless, these models cannot represent faithfully either the facial texture or the normals of the face, which are very crucial for photo-realistic face synthesis. Recently, it was demonstrated that Generative Adversarial Networks (GANs) can be used for generating high-quality textures of faces. Nevertheless, the generation process either omits the geometry and normals, or independent processes are used to produce 3D shape information. In this paper, we present the first methodology that generates high-quality texture, shape, and normals jointly, which can be used for photo-realistic synthesis. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. The qualitative results shown in this paper are compressed due to size limitations, full-resolution results and the accompanying video can be found in the supplementary documents.

<br/>

Supplementary Video

<p align="center"><img width="100%" alt="Watch the video" title="Click to Watch on YouTube" src="https://img.youtube.com/vi/wehBCetIb7E/sddefault.jpg" /></p>

Testing the Model

pip install menpo3d

python test.py

Preparing datasets for training

The TBGAN code repository contains a command-line tool for recreating bit-exact replicas of the datasets that we used in the paper. The tool also provides various utilities for operating on the datasets:

usage: dataset_tool.py [-h] <command> ...

    display             Display images in dataset.
    extract             Extract images from dataset.
    compare             Compare two datasets.
    create_from_pkl_img_norm  Create dataset from a directory full of texture, normals and shape.

Type "dataset_tool.py <command> -h" for more information.
Please ignore other functions. The main function to prepare tf_records is 'create_from_pkl_img_norm'

The datasets are represented by directories containing the same image data in several resolutions to enable efficient streaming. There is a separate *.tfrecords file for each resolution, and if the dataset contains labels, they are stored in a separate file as well:

> python dataset_tool.py create_from_pkl_img_norm datasets/tf_records datasets/texture(/*.png) dataset/shape(/*.pkl) dataset/normals(/*.pkl)

The create_* commands take the standard version of a given dataset as input and produce the corresponding *.tfrecords files as output.

Training networks

Please see how to start training with a PROGAN
Additionally, you will need to add 
> "dynamic_range=[-1,1],dtype = 'float32'" 
arguments to 'dataset' EasyDict() in config.py

Once the necessary datasets are set up, you can proceed to train your own networks. The general procedure is as follows:

  1. Edit config.py to specify the dataset and training configuration by uncommenting/editing specific lines.
  2. Run the training script with python train.py.
  3. The results are written into a newly created subdirectory under config.result_dir
  4. Wait several days (or weeks) for the training to converge, and analyze the results.

By default, config.py is configured to train a 1024x1024 network for CelebA-HQ using a single-GPU. This is expected to take about two weeks even on the highest-end NVIDIA GPUs. The key to enabling faster training is to employ multiple GPUs and/or go for a lower-resolution dataset. To this end, config.py contains several examples for commonly used datasets, as well as a set of "configuration presets" for multi-GPU training. All of the presets are expected to yield roughly the same image quality for CelebA-HQ, but their total training time can vary considerably:

For reference, the expected output of each configuration preset for CelebA-HQ can be found in networks/tensorflow-version/example_training_runs

Other noteworthy config options:

Analyzing results

Training results can be analyzed in several ways:

Acknowledgement

Baris Gecer is supported by the Turkish Ministry of National Education, Stylianos Ploumpis by the EPSRC Project EP/N007743/1 (FACER2VM), and Stefanos Zafeiriou by EPSRC Fellowship DEFORM (EP/S010203/1).

Code borrows heavily from NVIDIA's PRO-GAN implementation, please check and comply with its License. and cite their paper:

@inproceedings{karras2018progressive,
  title={Progressive Growing of GANs for Improved Quality, Stability, and Variation},
  author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
  booktitle={International Conference on Learning Representations},
  year={2018}
}

Citation

If you find this work is useful for your research, please cite our paper:

@inproceedings{gecer2020tbgan,
  title={Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks},
  author={{Gecer}, Baris and {Lattas}, Alexander and {Ploumpis}, Stylianos and
         {Deng}, Jiankang and {Papaioannou}, Athanasios and
         {Moschoglou}, Stylianos and {Zafeiriou}, Stefanos},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  year={2020},
  organization={Springer}
  doi = {10.1007/978-3-030-58526-6_25}
}