Awesome

GFGE

Teaser image

This is a repository with training and inference code for the paper ["Audio-Driven Stylized Gesture Generation with Flow-Based Model"].

Requirements

Linux OS
NVIDIA GPUs. We tested on A100 GPUs.
Python libraries: see environment.yml. You can use the following commands with Anaconda3 to create and activate your virtual environment:
- git clone https://github.com/yesheng-THU/GFGE.git
- cd GFGE
- conda env create -f environment.yml
- conda activate GFGE

Getting started

Datasets

In this work, we conducted our experiments on two datasets: TED Dataset and Trinity Dataset.

For TED Dataset, you can download the raw data from here (16GB) and extract the ZIP file into ../ted_dataset. Then you can use the following command to preprocess the TED Dataset:
```
python data_processing/prepare_deepspeech_gesture_datasets.py
```
The processed data will be under the folder data/locomotion. We also provide the processed data for training the complete model and the partial data for visualizing the latent space. You can directly download these NPZ files and place them under the folder data/locomotion.
For Trinity Dataset, we used the data to train our models. Trinity College Dublin requires interested parties to sign a license agreement and receive approval before gaining access to this dataset. This is also the same data that was used for the [GENEA Challenge 2020]. Place the data under the ../trinity_dataset folder and then run the following command:
```
python data_processing/prepare_trinity_datasets.py
```
The processed data will be under the folder data/GENEA.

Feature Extractors

To successfully train and test our network, you also need to download some auxiliary files. Feature extractors are required to compute the Gesture Perceptual Loss. You can either train your own feature extractors (by running python scripts/train_gp_loss.py) or directly download our pretrained feature extractor and extract the ZIP file into ./feature_extractor.
To calculate FGD metric during training and testing, you also need to download a checkpoint (the same as Yoon et al. proposed) and place it under the folder ./feature_extractor.

Model Checkpoints

We provide several pretrained model checkpoints. Download and extract these ZIP files into ./results.

model checkpoints that trained on complete TED Dataset.
model checkpoints that trained on Trinity Dataset (full body motion).
model checkpoints that trained on 15 person TED Dataset for latent space visualization.

Usage

First, please make sure that all requirements are satisfied and all required files are downloaded (see above steps).

Train

# train on ted dataset
python scripts/train.py hparams/preferred/locomotion.json locomotion

# train on trinity dataset
python scripts/train.py hparams/preferred/trinity.json trinity

Sample

# sample on ted dataset
python scripts/test_locomotion_sample.py

# sample on trinity dataset
python scripts/test_trinity_sample.py

Evaluate

python scripts/cal_metrics.py

Latent Space Visualization

python scripts/vis_latent_space.py

Style Transfer

python scripts/style_transfer.py

Results

TED Trinity

Acknowledgement

Note that the training and testing code of this repo is heavily rely on MoGlow and GTC. We thank the authors for their great job!