Home

Awesome

Speech2Lip

Official PyTorch implementation for the paper "Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video".

Project Page | Paper

Feel free to contact xzwu@eee.hku.hk if you have any questions about the code.

Prerequisites

Data Preprocessing

The source videos used in our experiments are referred to as LSP and Youtube Video. In this example, we use May's video and provide the bash scripts. After data preprocessing, the training data will be created in the dataset/may_face_crop_lip/ directory. Please replace it with your own data.

Train Speech2Lip

We use May's video as an example and provide the bash scripts.

Pretrained Models

Inference

We use May's video as an example and provide the bash scripts.

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{wu2023speech2lip,
  title={Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video},
  author={Wu, Xiuzhe and Hu, Pengfei and Wu, Yang and Lyu, Xiaoyang and Cao, Yan-Pei and Shan, Ying and Yang, Wenming and Sun, Zhongqian and Qi, Xiaojuan},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={22168--22177},
  year={2023}
}

Acknowledgments

We use face-parsing.PyTorch to compute head mask in the canonical space, DeepSpeech for audio feature extraction, Wav2Lip for sync expert network, and we are highly grateful to ADNeRF for their data preprocessing script.