Home

Awesome

Audio-Driven Emotional Video Portraits [CVPR2021]

Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu

[Project] [Paper]

visualization

Given an audio clip and a target video, our Emotional Video Portraits (EVP) approach is capable of generating emotion-controllable talking portraits and change the emotion of them smoothly by interpolating at the latent space.

Installation

We train and test based on Python3.6 and Pytorch. To install the dependencies run:

pip install -r requirements.txt

Testing

Training

  1. Generate the trainig data(MFCC) from the raw audio:

    python emotion_pretrain/code/mfcc_preprocess.py
    
  2. The emotion classification for MFCC:

     python emotion_pretrain/code/train.py
    
  1. Use DTW to align the audio:

    python disentanglement/dtw/MFCC_dtw.py
    
  2. Cross-reconstruction for disentanglement:

     python disentanglement/code/train_content+cla.py
    
  1. Generate the data for training:

    python landmark/code/preprocess.py
    
  2. Training the Audio-to-Landmark module:

     python landmark/code/train.py
    

Citation

@article{ji2021audio,
  title={Audio-Driven Emotional Video Portraits},
  author={Ji, Xinya and Zhou, Hang and Wang, Kaisiyuan and Wu, Wayne and Loy, Chen Change and Cao, Xun and Xu, Feng},
  journal={arXiv preprint arXiv:2104.07452},
  year={2021}
}