

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

This repository contains a PyTorch re-implementation of the paper: Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis (CVPR 2024).

| Arxiv | Video |

<img src='assets/pipeline.png' width='1000'/>


Requires Python 3.6+, Cuda 11.3+ and PyTorch 1.10+.

Tested in Linux and Anaconda3 with Python 3.9 and PyTorch 1.10.

Please refer to scripts/install.sh

conda create -n dyntet python=3.9
conda activate dyntet
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install ninja imageio PyOpenGL glfw xatlas gdown
pip install git+https://github.com/NVlabs/nvdiffrast/
pip install git+https://github.com/facebookresearch/pytorch3d/
pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn#subdirectory=bindings/torch
pip install scikit-learn configargparse face_alignment natsort matplotlib dominate tensorboard kornia trimesh open3d imageio-ffmpeg lpips easydict pysdf rich openpyxl gfpgan


The following steps refer to AD-NeRF.

In addition, the following steps refer to Deep3DFace. We use 3DMM coefficients to drive talking heads.

    └─── checkpoints
        └─── facerecon
            └─── epoch_20.pth

For evaluation, download the pre-trained model arcface model and organize the directory into the following structure:

    └─── model_ir_se50.pth




To train the model on the Obama video:

python train.py --config configs/obama.json


To evaluate the trained model on the validation dataset:

python evaluate_utils/evaluate.py --train_dir out/obama


To infer the video of validation dataset:

python infer.py --config configs/obama.json 

To infer the video with customized 3DMM coefficients, and (optionally) merge the video and audio:

python infer.py --config configs/obama.json --drive_3dmm data/test_audio/obama_sing_sadtalker.npy --audio data/test_audio/sing.wav

Note: Given an audio (e.g., AUDIO.wav), you can try SadTalker to generate the 3DMM coefficients mat file (e.g., FILE.mat) , then run

python infer.py --config configs/obama.json --drive_3dmm FILE.mat --audio AUDIO.wav



Consider citing as below if you find this repository helpful to your project:

    title={Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis}, 
    author={Zicheng Zhang and Ruobing Zheng and Ziwen Liu and Congying Han and Tianqi Li and Meng Wang and Tiande Guo and Jingdong Chen and Bonan Li and Ming Yang},


This code is developed heavily relying on AD-NeRF for data processing, nvdiffrec for Marching Tetrahedra, Deep3DFace for 3DMM extraction. Some of the code is drawn from OTAvatar, RAD-NeRF and ER-NeRF. Thanks for these great projects. Please follow the license of the above open-source code