Home

Awesome

FaceXHuBERT (ICMI '23)

Code repository for the paper:

FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis using Self-Supervised Speech Representation Learning.

Authors: Kazi Injamamul Haque, Zerrin Yumak

[Paper] [Project Page] [Video]

This GitHub repository contains PyTorch implementation of the work presented in the paper mentioned above. Given a raw audio, FaceXHuBERT generates and renders expressive 3D facial animation. We recommend visiting the project website and watching the supplementary video.

<p align="center"> <img src="FaceXHuBERT.png" width="90%" /> </p>

Environment

Dependencies

Get Started

It is recommended to create a new anaconda environment with Python 3.8. To do so, please follow the steps sequentially-

cd <repository_path>
conda env create --name FaceXHuBERT python=3.8 --file="environment.yml"
conda activate FaceXHuBERT
pip install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html

Demo

Download the pretrained model from FaceXHuBERT model. Put the pretrained model under pretrained_model/ folder.

Data

BIWI

The Biwi 3D Audiovisual Corpus of Affective Communication dataset is available upon request for research or academic purposes. You will need the following files from the the dataset:

Data Preparation and Data Pre-process

Follow the steps below sequentially as they appear -

Model Training

Training and Testing

Visualization

Evaluation

Citation

If you find this code useful for your work, please be kind to consider citing our paper:

@inproceedings{FaceXHuBERT_Haque_ICMI23,
    author = {Haque,  Kazi Injamamul and Yumak,  Zerrin},
    title = {FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning},
    booktitle = {INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI ’23)},
    year = {2023},
    location = {Paris, France},
    numpages = {10},
    url = {https://doi.org/10.1145/3577190.3614157},
    doi = {10.1145/3577190.3614157},
    publisher = {ACM},
    address = {New York, NY, USA},
}

Acknowledgement

We would like to thank the authors of FaceFormer for making their code available. Thanks to ETH Zurich CVL for providing us access to the Biwi 3D Audiovisual Corpus. The HuBERT implementation is borrowed from Hugging Face.

License

This repository is released under CC-BY-NC-4.0-International License