Home

Awesome

3D Morphable Models as Spatial Transformer Networks

Update: A simple gradient descent method is added to show how the layers work. Please see the demo.m.

This page shows how to use a 3D morphable model as a spatial transformer within a convolutional neural network (CNN). It is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The network (specifically, the localiser part of the network) learns to fit a 3D morphable model to a single 2D image without needing labelled examples of fitted models.

<p align="center"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/elon_musk_34.jpg" alt="Elon Musk (34)" title="Elon Musk (34)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/christian_bale_51.jpg" alt="Christian Bale (51)" title="Christian Bale (51)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/elisha_cuthbert_53.jpg" alt="Elisha Cuthbert (53)" title="Elisha Cuthbert (53)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/clint_eastwood_62.jpg" alt="Clint Eastwood (62)" title="Clint Eastwood (62)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/emma_watson_73.jpg" alt="Emma Watson (73)" title="Emma Watson (73)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/chuck_palahniuk_48.jpg" alt="Chuck Palahniuk (48)" title="Chuck Palahniuk (48)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/nelson_mandela_52.jpg" alt="Nelson Mandela (52)" title="Nelson Mandela (52)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/kim_jong-un_60.jpg" alt="Kim Jong-un (60)" title="Kim Jong-un (60)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/ben_affleck_66.jpg" alt="Ben Affleck (66)" title="Ben Affleck (66)" width="19.4%"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/average/courteney_cox_127.jpg" alt="Courteney Cox (127)" title="Courteney Cox (127)" width="19.4%"> </p> <p align="center"> A set of mean flattened images that are obtained by applying the 3DMM-STN to multiple images of the same person from the <a href="http://www.umdfaces.io">UMDFaces Dataset</a>.<br><i>(Please hover over the image to see the subject's name and the number of images used for averaging)</i> </p>

The proposed architecture is based on a purely geometric approach in which only the shape component of a 3DMM is used to geometrically normalise an image. Our method can be trained in an unsupervised fashion, and thus does not depend on synthetic training data or the fitting results of an existing algorithm.

In contrast to all previous 3DMM fitting networks, the output of our 3DMM-STN is a 2D resampling of the original image which contains all of the high frequency, discriminating detail in a face rather than a model-based reconstruction which only captures the gross, low frequency aspects of appearance that can be explained by a 3DMM.

Citation

Please cite the following paper (DOI) if you use this work in your research:

A. Bas, P. Huber, W.A.P. Smith, M. Awais and J. Kittler. "3D Morphable Models as Spatial Transformer Networks". In Proc. ICCV Workshop on Geometry Meets Deep Learning, pp. 904-912, 2017.

Usage & Training

We train our network using the MatConvNet library. Plese refer to the installation page for the instructions.

In order to start the training, you need to create the resampled expression model first. To do that, you need (1) Basel Face Model, 01_MorphableModel.mat and (2) 3DDFA Expression Model, Model_Expression.mat. You can set the paths accordingly and run the prepareExpressionBFM function in the prepareModel folder to build a resampled expression model.

Finally, run the dagnn_3dmmasstn.m script to start the training.

<img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/fig1.png" alt="Overview of the 3DMM-STN" width="50%"><img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/fig2.png" alt="The grid generator network within a 3DMM-STN" width="50%">

Localiser Network

The localiser network is a CNN that takes an image as input and regresses the pose and shape parameters, theta (θ = r, t, logs, α). For our localiser network, we use the pre-trained VGGFaces architecture, delete the classification layer and add a new fully connected layer with 6 + D outputs. The pre-trained models can be downloaded from MatConvNet model repository.

Grid Generator Network

Our grid generator combines a linear statistical model with a scaled orthographic projection. We apply a 3D transformation and projection to a 3D mesh that comes from the morphable model. The intensities sampled from the source image are then assigned to the corresponding points in a flattened 2D grid.

UV texture space embedding for Basel Face Model

The output of our 3DMM-STN is a resampled image in a flattened 2D texture space in which the images are in dense, pixel-wise correspondence. In other words, the output grid is a texture space flattening of the 3DMM mesh. Specifically, we compute a Tutte embedding using conformal Laplacian weights and with the mesh boundary mapped to a square. To ensure a symmetric embedding we map the symmetry line to the symmetry line of the square, flatten only one side of the mesh and obtain the flattening of the other half by reflection.

You can find the UV coordinates as BFM_UV.mat file in the util folder.

<p align="center"> <img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/UV.png" alt="The output grid visualisation using the mean texture" width="25%"><img src="https://github.com/anilbas/3DMMasSTN/blob/master/img/geometry.png" alt="The mean shape as a geometry image" width="25%"> </p>

Customised Layers

In this section, we summarise our customised layers and loss functions. Please refer to the paper for more details.

Geometric Loss Functions

Dependencies