Awesome

BeLFusion

Latent Diffusion for Behavior-Driven Human Motion Prediction (ICCV'23)

BeLFusion's architecture

This repository contains the official PyTorch implementation of the paper:

BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction German Barquero, Sergio Escalera, and Cristina Palmero ICCV 2023 [website] [paper] [demo]

Note: our data loaders consider an extra dimension for the number of people in the scene. Since the project aims at single-human motion prediction, this dimension is always 1.

Installation

1. Environment

<details> <summary>OPTION 1 - Python/conda environment</summary>

conda create -n belfusion python=3.9.5
conda activate belfusion
pip install -r requirements.txt

</details> <details> <summary>OPTION 2 - Docker</summary> We also provide a DockerFile to build a Docker image with all the required dependencies.

IMPORTANT: This option will not let you launch the visualization script, as it requires a GUI. You will be able though to train and evaluate the models.

To build and launch the Docker image, run the following commands from the root of the repository:

docker build . -t belfusion
docker run -it --gpus all --rm --name belfusion \
-v ${PWD}:/project \
belfusion

You should now be in the container, ready to run the code.

</details>

2. Datasets

> Human3.6M

Extract the Poses-D3Positions* folders for S1, S5, S6, S7, S8, S9, S11 into ./datasets/Human36M. Then, run:

python -m data_loader.parsers.h36m

> AMASS

Download the SMPL+H G files for 22 datasets: ACCAD, BMLhandball, BMLmovi, BMLrub, CMU, DanceDB, DFaust, EKUT, EyesJapanDataset, GRAB, HDM05, HUMAN4D, HumanEva, KIT, MoSh, PosePrior (MPI_Limits), SFU, SOMA, SSM, TCDHands, TotalCapture, and Transitions. Then, move the tar.bz2 files to ./datasets/AMASS (DO NOT extract them).

Now, download the 'DMPLs for AMASS' from here, and the 'Extended SMPL+H model' from here. Move both extracted folders (dmpls, smplh) to ./auxiliar/body_models. Then, run:

python -m data_loader.parsers.amass --gpu

Note 1: remove the --gpu flag if you do not have a GPU.

Note 2: this step could take a while (~2 hours in CPU, ~20-30 minutes in GPU).

3. Checkpoints (link)

Replace the folder 'checkpoints' in the root of the repository with the downloaded one. If you want to train the models from scratch, you can skip this step and go to the training section.

Evaluation

Run the following scripts to evaluate BeLFusion and the other state-of-the-art methods.

Human3.6M:

# BeLFusion 
python eval_belfusion.py -c checkpoints/ours/h36m/BeLFusion/final_model/ -i 217 --ema --mode stats --batch_size 512

# Baselines --> {ThePoseKnows, DLow, GSPS, DiverseSampling}
python eval_baseline.py -c checkpoints/baselines/h36m/<BASELINE_NAME>/exp -m stats --batch_size 512

AMASS:

# BeLFusion
python eval_belfusion.py -c checkpoints/ours/amass/BeLFusion/final_model/ -i 1262 --multimodal_threshold 0.4 --ema --mode stats --batch_size 512

# Baselines --> {ThePoseKnows, DLow, GSPS, DiverseSampling}
python eval_baseline.py -c checkpoints/baselines/amass/<BASELINE_NAME>/exp -m stats --batch_size 512 --multimodal_threshold 0.4

Add --stats_mode all to also compute the MMADE, MMFDE (increased computation time).
Add -cpu to run the evaluation in CPU (recommended for low-memory GPUs).
(only for BeLFusion) Use --dstride S to compute the evaluation metrics every S denoising steps (increased computation time). If S=10, the metrics will be computed for step 1 (BeLFusion_D), and 10 (BeLFusion).

Visualization

BeLFusion's architecture

Run the following scripts to visualize the results of BeLFusion and the other state-of-the-art methods (<DATASET> in {h36m, amass}).

# BeLFusion with Human3.6M (press '0' to visualize BeLFusion_D)
python eval_belfusion.py -c checkpoints/ours/h36m/BeLFusion/final_model/ -i 217 --ema --mode vis --batch_size 64 --dstride 10

# BeLFusion with AMASS (press '0' to visualize BeLFusion_D)
python eval_belfusion.py -c checkpoints/ours/amass/BeLFusion/final_model/ -i 1262 --ema --mode vis --batch_size 64 --dstride 10

# Baselines --> {ThePoseKnows, DLow, GSPS, DiverseSampling}
python eval_baseline.py -c checkpoints/baselines/<DATASET>/<BASELINE_NAME>/exp -m vis --batch_size 64

Press n to navigate between the samples.
Set --samples N to generate N samples. Set the columns in the visualization grid with --ncols N.
During visualization, press h to show only the future motion (without observation).
(only for BeLFusion) When --dstride S for S != -1, you can visualize the output of BeLFusion every S denoising steps (press keys 0, 1, 2, ..., to navigate from 1, 1+S, 1+2S, ...).

Note: Replace --mode vis with --mode gen to generate the gif animations instead of visualizing them. In this mode, set the argument --store_idx I to store the gifs for denoising step I. For example, set I to 1 for BeLFusion_D's outputs.

Training

For training BeLFusion from scratch, you need to first train the Behavioral Latent Space (BLS) and the observation autoencoder (<DATASET> in {h36m, amass}). Both models can be trained in parallel:

# Observation autoencoder --> 500 epochs
python train_auto.py -c checkpoints/ours/<DATASET>/BeLFusion/final_model/autoencoder_obs/config.json

# BLS --> 2x500 epochs
python train_bls.py -c checkpoints/ours/<DATASET>/BeLFusion/final_model/behavioral_latent_space/config.json

Once they finish, you can train the Latent Diffusion Model (LDM):

# BeLFusion --> 217/1262 epochs for H36M/AMASS
python train_belfusion.py -c checkpoints/ours/<DATASET>/BeLFusion/final_model/config.json

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{barquero2023belfusion,
  title={BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction},
  author={Barquero, German and Escalera, Sergio and Palmero, Cristina},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

License

The software in this repository is freely available for free non-commercial use (see license for further details).

Note 1: project structure borrowed from @victoresque's template.

Note 2: code under ./models/sota is based on the original implementations of the corresponding papers (Dlow, DiverseSampling, and GSPS).