Home

Awesome

ClipFace: Text-guided Editing of Textured 3D Morphable Models<br><sub>Official PyTorch implementation of SIGGRPAH 2023 paper</sub>

Teaser

ClipFace: Text-guided Editing of Textured 3D Morphable Models<br> Shivangi Aneja, Justus Thies, Angela Dai, Matthias Niessner<br> https://shivangi-aneja.github.io/projects/clipface <br>

Abstract: We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We leverage the geometric expressiveness of 3D morphable models, which inherently possess limited controllability and texture expressivity, and develop a self-supervised generative model to jointly synthesize expressive, textured, and articulated faces in 3D. We enable high-quality texture generation for 3D faces by adversarial self-supervised training, guided by differentiable rendering against collections of real RGB images. Controllable editing and manipulation are given by language prompts to adapt texture and expression of the 3D morphable model. To this end, we propose a neural network that predicts both texture and expression latent codes of the morphable model. Our model is trained in a self-supervised fashion by exploiting differentiable rendering and losses based on a pre-trained CLIP model. Once trained, our model jointly predicts face textures in UV-space, along with expression parameters to capture both geometry and texture changes in facial expressions in a single forward pass. We further show the applicability of our method to generate temporally changing textures for a given animation sequence.

<br>

<a id="section1">1. Getting started</a>

Pre-requisites

Installation

<a id="section2">2. Pre-trained Models required for training ClipFace</a>

Please download these models, as they will be required for experiments.

PathDescription
FLAMEWe use FLAME 3DMM in our experiments. FLAME takes as input shape, pose and expression blendshapes and predicts mesh vertices. We used the FLAME 2020 generic model for our experiments. Using any other FLAME model might lead to wrong mesh predictions for expression manipulation experiments. Please download the model from the official website by signing their user agreement. Copy the generic model as data/flame/generic_model.pkl and FLAME template as data/flame/head_template.obj in the project directory.
DECADECA model predicts FLAME parameters for an RGB image. This is used during training StyleGAN-based texture generator, is available for download here This can be skipped you don't intend to train the texture generator and use our pre-trained texture generator.

<a id="section3">3. Training</a>

The code is well-documented and should be easy to follow.

<a id="section4">4. ClipFace Pretrained Models and Dataset Assets</a>

PathDescription
Filtered FFHQ DatasetDownload the filenames of Filtered FFHQ dataset; alpha masks and FLAME-space mesh vertices predicted using DECA. This can be skipped if you don't intend to train the texture generator and use our pre-trained texture generator.
Texture GeneratorThe pretrained texture generator to synthesize UV texture maps.
UV Texture Latent CodesThe latent codes generated from texture generator used to train the text-guided mapper networks.
Text-Manipulation AssetsThe flame parameters & vertices for a neutral template face, These will be used to perform clip-guided manipulation. Copy these to data/clip/ directory.
Video Manipulation DatasetIn the paper, for temporal textures we show results for two text prompts (laughing and angry). Here we provide the pre-computed FLAME parameters for these sequences. Download them and extract to appropriate directory and configure the path corresponding to key exp_codes_pth in config/clipface.yaml.
Pretrained Zero-Offset MapperPretrained mappers to predict zero offsets for text-guided manipulation
Pretrained Texture & Expression Manipulation ModelsPretrained ClipFace checkpoints for different texture and expression styles shown in paper. Texture manipulation models can be downloaded from here; and expression manipulation models can be downloaded from here.
Pretrained Zero-Offset Video MapperPretrained mappers to predict zero offsets for text-guided video manipulation
Pretrained Video Manipulation ModelsPretrained ClipFace checkpoints for video manipulation. For the text-prompt laughing available here. And for text prompt angry available here.

<a id="section5">5. Inference</a>

  # To evaluate only for texture manipulation
  python -m tests.test_mlp_texture
  
  # To evaluate for both texture and expression manipulation
  python -m tests.test_mlp_texture_expression
  # To evaluate temporal textures for given video sequence
  python -m tests.test_video_mlp_texture
</br>

Citation

If you find our dataset or paper useful for your research , please include the following citation:


@inproceedings{aneja2023clipface,
    author    = {Aneja, Shivangi and Thies, Justus and Dai, Angela and Nie{\ss}ner, Matthias},
    booktitle = {SIGGRAPH '23 Conference Proceedings},
    title     = {ClipFace: Text-guided Editing of Textured 3D Morphable Models},
    year      = {2023},
    doi       = {10.1145/3588432.3591566},
    url       = {https://shivangi-aneja.github.io/projects/clipface/},
    issn      = {979-8-4007-0159-7/23/08},
}
</br>

Contact Us

If you have questions regarding the dataset or code, please email us at shivangi.aneja@tum.de. We will get back to you as soon as possible.