Awesome

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022 Conference]

Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, Xun Cao

visualization

Given a single portrait image, we can synthesize emotional talking faces, where mouth movements match the input audio and facial emotion dynamics follow the emotion source video.

Installation

We train and test based on Python3.6 and Pytorch. To install the dependencies run:

pip install -r requirements.txt

Testing

Download the pre-trained models and data under the following link: google-drive and put the file in corresponding places.
Run the demo：

python demo.py --source_image path/to/image --driving_video path/to/emotion_video --pose_file path/to/pose --in_file path/to/audio --emotion emotion_type
Prepare testing data：

prepare source_image -- crop_image in process_data.py

prepare driving_video -- crop_image_tem in process_data.py

prepare pose -- detect pose using 3DDFA_V2

Training

Training data structure:

./data/<dataset_name>
├──fomm_crop
│  ├──id/file_name   # cropped images
│  │  ├──0.png
│  │  ├──...
├──fomm_pose_crop
│  ├──id   
│  │  ├──file_name.npy  # pose of the cropped images
│  │  ├──...
├──MFCC
│  ├──id   
│  │  ├──file_name.npy  # MFCC of the audio
│  │  ├──...


*The cropped images are generated by 'crop_image_tem' in process_data.py
*The pose of the cropped video are generated by 3DDFA_V2/demo.py
*The MFCC of the audio are generated by 'audio2mfcc' in process_data.py

Step 1 : Train the Audio2Facial-Dynamics Module using LRW dataset

python run.py --config config/train_part1.yaml --mode train_part1 --checkpoint log/124_52000.pth.tar
Step 2 : Fine-tune the Audio2Facial-Dynamics Module after getting stable results from step1

python run.py --config config/train_part1_fine_tune.yaml --mode train_part1_fine_tune --checkpoint log/124_52000.pth.tar --audio_chechpoint checkpoint/from/step_1
Setp 3 : Train the Implicit Emotion Displacement Learner

python run.py --config config/train_part2.yaml --mode train_part2 --checkpoint log/124_52000.pth.tar --audio_chechpoint checkpoint/from/step_2

Citation

@inproceedings{10.1145/3528233.3530745,
author = {Ji, Xinya and Zhou, Hang and Wang, Kaisiyuan and Wu, Qianyi and Wu, Wayne and Xu, Feng and Cao, Xun},
title = {EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model},
year = {2022},
isbn = {9781450393379},
url = {https://doi.org/10.1145/3528233.3530745},
doi = {10.1145/3528233.3530745},
booktitle = {ACM SIGGRAPH 2022 Conference Proceedings},
series = {SIGGRAPH '22}
}