Awesome
GFGE
This is a repository with training and inference code for the paper ["Audio-Driven Stylized Gesture Generation with Flow-Based Model"].
Requirements
- Linux OS
- NVIDIA GPUs. We tested on A100 GPUs.
- Python libraries: see environment.yml. You can use the following commands with Anaconda3 to create and activate your virtual environment:
git clone https://github.com/yesheng-THU/GFGE.git
cd GFGE
conda env create -f environment.yml
conda activate GFGE
Getting started
Datasets
In this work, we conducted our experiments on two datasets: TED Dataset and Trinity Dataset.
-
For TED Dataset, you can download the raw data from here (16GB) and extract the ZIP file into
../ted_dataset
. Then you can use the following command to preprocess the TED Dataset:python data_processing/prepare_deepspeech_gesture_datasets.py
The processed data will be under the folder
data/locomotion
. We also provide the processed data for training the complete model and the partial data for visualizing the latent space. You can directly download these NPZ files and place them under the folderdata/locomotion
. -
For Trinity Dataset, we used the data to train our models. Trinity College Dublin requires interested parties to sign a license agreement and receive approval before gaining access to this dataset. This is also the same data that was used for the [GENEA Challenge 2020]. Place the data under the
../trinity_dataset
folder and then run the following command:python data_processing/prepare_trinity_datasets.py
The processed data will be under the folder
data/GENEA
.
Feature Extractors
-
To successfully train and test our network, you also need to download some auxiliary files. Feature extractors are required to compute the Gesture Perceptual Loss. You can either train your own feature extractors (by running
python scripts/train_gp_loss.py
) or directly download our pretrained feature extractor and extract the ZIP file into./feature_extractor
. -
To calculate FGD metric during training and testing, you also need to download a checkpoint (the same as Yoon et al. proposed) and place it under the folder
./feature_extractor
.
Model Checkpoints
We provide several pretrained model checkpoints. Download and extract these ZIP files into ./results
.
-
model checkpoints that trained on complete TED Dataset.
-
model checkpoints that trained on Trinity Dataset (full body motion).
-
model checkpoints that trained on 15 person TED Dataset for latent space visualization.
Usage
First, please make sure that all requirements are satisfied and all required files are downloaded (see above steps).
Train
# train on ted dataset
python scripts/train.py hparams/preferred/locomotion.json locomotion
# train on trinity dataset
python scripts/train.py hparams/preferred/trinity.json trinity
Sample
# sample on ted dataset
python scripts/test_locomotion_sample.py
# sample on trinity dataset
python scripts/test_trinity_sample.py
Evaluate
python scripts/cal_metrics.py
Latent Space Visualization
python scripts/vis_latent_space.py
Style Transfer
python scripts/style_transfer.py
Results
Acknowledgement
Note that the training and testing code of this repo is heavily rely on MoGlow and GTC. We thank the authors for their great job!