Awesome
StyleTalk
The official repository of the AAAI2023 paper StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles
<p align='center'> <b> <a href="https://arxiv.org/abs/2301.01081">Paper</a> | <a href="https://drive.google.com/file/d/19WRhBHYVWRIH8_zo332l00fLXfUE96-k/view?usp=share_link">Supp. Materials</a> | <a href="https://youtu.be/mO2Tjcwr4u8">Video</a> </b> </p> <p align='center'> <img src='media/first_page.png' width='700'/> </p>The proposed StyleTalk can generate talking head videos with speaking styles specified by arbitrary style reference videos.
News
- April 14th, 2023. The code is available.
Get Started
Installation
Clone this repo, install conda and run:
conda create -n styletalk python=3.7.0
conda activate styletalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpeg
The code has been test on CUDA 11.1, GPU RTX 3090.
Data Preprocessing
Our methods takes 3DMM parameters(*.mat) and phoneme labels(*_seq.json) as input. Follow PIRenderer to extract 3DMM parameters. Follow AVCT to extract phoneme labels. Some preprocessed data can be found in folder samples
.
Inference
Download checkpoints for StyleTalk and Renderer and put them into ./checkpoints
.
Run the demo:
python inference_for_demo.py \
--audio_path samples/source_video/phoneme/reagan_clip1_seq.json \
--style_clip_path samples/style_clips/3DMM/happyenglish_clip1.mat \
--pose_path samples/source_video/3DMM/reagan_clip1.mat \
--src_img_path samples/source_video/image/andrew_clip_1.png \
--wav_path samples/source_video/wav/reagan_clip1.wav \
--output_path demo.mp4
Change audio_path
, style_clip_path
, pose_path
, src_img_path
, wav_path
, output_path
to generate more results.
Acknowledgement
Some code are borrowed from following projects:
Thanks for their contributions!