Awesome
Speech Driven Tongue Animation
Advances in speech driven animation techniques now allow creating convincing animations of virtual characters solely from audio data. While many approaches focus on facial and lip motion, they often do not provide realistic animation of the inner mouth. Performance or motion capture of the tongue and jaw from video alone is difficult because the inner mouth is only partially observable during speech. In this work, we collected a large-scale speech to tongue mocap dataset that focuses on capturing tongue, jaw, and lip motion during speech. This dataset enables research on data-driven techniques for realistic inner mouth animation. We present a method that leverages recent deep-learning based audio feature representations to build a robust and generalizable speech to animation pipeline. We find that self-supervised deep learning based audio feature encoders are robust and generalize well to unseen speakers and content.
Links: [Project] | [Paper] | [Video] | [Data]
Data
The data can be downloaded from this link. The dataset includes:
- Mono audio in
wav
format with a sample rate of 16 kHz - EMA 3D landmark sequences @ 50 FPS
- Audio transcripts
Code
π·π·π· UNDER CONSTRUCTION π·π·π·
Installation
Conda Environment
Create the conda environment from the yaml file envs/tongueanim.yaml
conda create -f envs/tongueanim.yaml
Wav2Vec
Our best model uses Wav2Vec audio features. For this you need to download the model from the Fairseq repository and place it under the models/
folder.
Pipeline
Our pipeline consists of the following stage:
- Extract audio features from wav2vec model
- Build the dataset to train the model
- Train the landmark prediction model
- Evaluate the model
- Visualize the model
1. Audio Feature Extraction
2. Building the dataset
3. Training the model
4. Testing the model
5. Visualizing the results
Citation
If you find this work useful on your research, please cite our work:
@inproceedings{medina2022speechtongue,
title={Speech Driven Tongue Animation},
author={Medina, Salvador and TomΓ©, Denis and Stoll, Carsten and Tiede, Mark and Munhall, Kevin and Hauptmann, Alex and Matthews, Iain},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022},
organization={IEEE/CVF}
}
License
Our code is released under MIT License.
The license agreement for the data usage implies citation of the paper. Please notice that citing the dataset URL instead of the publication would not be compliant with this license agreement.