Awesome
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
This repository contains a PyTorch re-implementation of the paper: Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis (CVPR 2024).
| Arxiv | Video |
<img src='assets/pipeline.png' width='1000'/>Installation
Requires Python 3.6+, Cuda 11.3+ and PyTorch 1.10+.
Tested in Linux and Anaconda3 with Python 3.9 and PyTorch 1.10.
Please refer to scripts/install.sh
conda create -n dyntet python=3.9
conda activate dyntet
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install ninja imageio PyOpenGL glfw xatlas gdown
pip install git+https://github.com/NVlabs/nvdiffrast/
pip install git+https://github.com/facebookresearch/pytorch3d/
pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn#subdirectory=bindings/torch
pip install scikit-learn configargparse face_alignment natsort matplotlib dominate tensorboard kornia trimesh open3d imageio-ffmpeg lpips easydict pysdf rich openpyxl gfpgan
Preparation
The following steps refer to AD-NeRF.
-
Prepare face-parsing model.
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth
-
Prepare the 3DMM model for head pose estimation.
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy
-
Download 3DMM model from Basel Face Model 2009:
# 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/ # 2. cd data_utils/face_tracking && python convert_BFM.py
In addition, the following steps refer to Deep3DFace. We use 3DMM coefficients to drive talking heads.
- Download the pre-trained model using this link (google drive) and organize the directory into the following structure:
data_utils
│
└───Deep3DFaceRecon
│
└─── checkpoints
│
└─── facerecon
│
└─── epoch_20.pth
For evaluation, download the pre-trained model arcface model and organize the directory into the following structure:
evaluate_utils
│
└───arcface
│
└─── model_ir_se50.pth
Usage
Pre-processing
-
Put training video under
data/video/<ID>.mp4
- The video must be 25FPS, with all frames containing the talking person.
- Due to the usage of nvdiffrast, we will process video width and height into integers multiple of 8, like 448*448 and 512*512.
We get the experiment videos mainly from AD-NeRF, ER-NeRF, GeneFace and YouTube. Due to copyright restrictions, we can't distribute all of them. You may have to download and crop these videos by youself. Here is an example training video (Obama) from AD-NeRF.
mkdir -p data/video wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/video/obama.mp4
-
Run script to process the video. (may take several hours)
python data_utils/process.py --path "data/video/obama.mp4" --save_dir "data/video/obama" --task -1
Train
To train the model on the Obama video:
python train.py --config configs/obama.json
Evaluation
To evaluate the trained model on the validation dataset:
python evaluate_utils/evaluate.py --train_dir out/obama
Inference
To infer the video of validation dataset:
python infer.py --config configs/obama.json
To infer the video with customized 3DMM coefficients, and (optionally) merge the video and audio:
python infer.py --config configs/obama.json --drive_3dmm data/test_audio/obama_sing_sadtalker.npy --audio data/test_audio/sing.wav
Note: Given an audio (e.g., AUDIO.wav
), you can try SadTalker to generate the 3DMM coefficients mat file (e.g., FILE.mat
) , then run
python infer.py --config configs/obama.json --drive_3dmm FILE.mat --audio AUDIO.wav
TODO
- Release Code.
- We consider that uploading a script that fine-tunes GFPGAN on DynTet to enhance the visual effects of talking head.
Citation
Consider citing as below if you find this repository helpful to your project:
@InProceedings{zhang2024learning,
title={Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis},
author={Zicheng Zhang and Ruobing Zheng and Ziwen Liu and Congying Han and Tianqi Li and Meng Wang and Tiande Guo and Jingdong Chen and Bonan Li and Ming Yang},
booktitle={CVPR},
year={2024}
}
Acknowledgements
This code is developed heavily relying on AD-NeRF for data processing, nvdiffrec for Marching Tetrahedra, Deep3DFace for 3DMM extraction. Some of the code is drawn from OTAvatar, RAD-NeRF and ER-NeRF. Thanks for these great projects. Please follow the license of the above open-source code