Awesome

<h2 align="center">Can $\large{\color{Orange}{\textbf{\textsf{Language}}}}$ Beat $\large{\color{MidnightBlue}{\textbf{\textsf{Numerical Regression}}}}$? Language-Based Multimodal Trajectory Prediction </h2> <a href="https://InhwanBae.github.io/">Inhwan Bae</a> · <a href="https://leejunoh.com/">Junoh Lee</a> · <a href="https://scholar.google.com/citations?user=Ei00xroAAAAJ">Hae-Gon Jeon</a> CVPR 2024 <a href="https://inhwanbae.github.io/publication/lmtrajectory/"><code>Project Page</code></a> <a href="https://arxiv.org/abs/2403.18447"><code>CVPR Paper</code></a> <a href="https://github.com/InhwanBae/LMTrajectory"><code>Source Code</code></a> <a href="#-citation"><code>Related Works</code></a> <div align='center'> <img src="img/lmtraj-model.gif" width=70%> Traditional vs. Our language-based trajectory prediction, LMTraj. </div>

This repository contains the code for the LMTrajectory framework. TL;DR: Language model-based, Multimodal input, Multimodal output, Multi-task training approach for Zero-shot and Supervised human trajectory prediction.

💬 LMTrajectory Framework 🗨️

Prompt-Based Approach: Moving away from conventional numerical regression models, we reframe the task into a prompt-based question-answering perspective.
Social Reasoning: Beyond physics-based mathematical interaction modeling, our approach leverages language models to incorporate social reasoning.
Multi-Task Training: Supplementary tasks enhance the model's ability to grasp higher-level context through multi-task training.
Numerical Tokenizer: Our numerical tokenizer effectively separates text and numbers, enabling the model to learn correlations in sequential data.
SOTA Performance: Our holistic solution achieves state-of-the-art results on trajectory prediction benchmarks traditionally dominated by numerical regressors.

❄️ Zero-Shot Evaluation ❄️

Setup

Environment All models were tested on Ubuntu 20.04 with Python 3.10 and PyTorch 2.0.1 with CUDA 11.7. Dependencies include Python packages such as scipy, simdkalman and openai==0.28.0.

Dataset Preprocessed ETH and UCY datasets are released in this repository. The train/validation/test splits are the same as those fond in Social-GAN.

Sample We provide our zero-shot prediction results in the release section. These results include all multimodal trajectories and are available for use in future zero-shot research.

Evaluate LMTraj-ZERO

Preliminary To evaluate our LMTraj-ZERO model, you will need an OPENAI_API_KEY to access the OpenAI API. Create the API key using the instruction provided by OpenAI, and then paste the key into ./zero-shot/chatgpt_trajectory_predictor_v3.py line 25.

Prediction We provide scripts to evaluate our LMTraj-ZERO model for all datasets simultaneously. Two scripts are provided in ./zero-shot/chatgpt_sequential_v3.sh and ./zero-shot/chatgpt_multi_v3.sh. The former script is used to evaluate our model step-by-step, and the latter script is used to evaluate our model with a thread pool for faster inference.

# Choose one of the following scripts to evaluate our LMTraj-ZERO model.
./chatgpt_sequential_v3.sh -d <DATASET_ID> -m <LLM_MODEL_ID>
./chatgpt_multi_v3.sh -d <DATASET_ID> -m <LLM_MODEL_ID>

# Supported dataset id: 0 (ETH), 1 (HOTEL), 2 (UNIV), 3 (ZARA1), 4 (ZARA2)
# Supported llm model id: 0 (gpt-3.5-turbo-0301), 1 (gpt-4-0314), 2 (gpt-3.5-turbo-1106), 3 (gpt-4-1106-preview)

# Examples
cd zero-shot
./chatgpt_multi_v3.sh -d 0 -m 3
./chatgpt_multi_v3.sh -d 1 -m 3

If an error is encountered, your progress will be saved. When you rerun the same script, it will skip the parts that were successfully executed and only regenerate the paths where issues occurred.

If you want to run the model with custom hyperparameters or other models available by OpenAI, use ./zero-shot/chatgpt_trajectory_predictor_v3.py instead of the script file. Warning: A misclick could upgrade you to OpenAI Tier 5, as it did for me :(

Evaluation As the final step, we provide code to evaluate the trajectories generated by our LMTraj-ZERO. To evaluate, first combine the predicted trajectories into a single JSON file.

python ./zero-shot/chatgpt-fragmented_dump_combiner.py --dataset <DATASET_ID> --model <LLM_MODEL_ID>

# Supported dataset id: 0 (ETH), 1 (HOTEL), 2 (UNIV), 3 (ZARA1), 4 (ZARA2)
# Supported llm model id: 0 (gpt-3.5-turbo-0301), 1 (gpt-4-0314), 2 (gpt-3.5-turbo-1106), 3 (gpt-4-1106-preview)

# Examples
python ./zero-shot/chatgpt-fragmented_dump_combiner.py --dataset 0 --model 3
python ./zero-shot/chatgpt-fragmented_dump_combiner.py --dataset 1 --model 3

Next, evaluate the combined trajectories using ADE and FDE metrics.

python ./zero-shot/compute_ade_fde_from_dump.py --dataset <DATASET_ID> --model <LLM_MODEL_ID>

# Supported dataset id: 0 (ETH), 1 (HOTEL), 2 (UNIV), 3 (ZARA1), 4 (ZARA2)
# Supported llm model id: 0 (gpt-3.5-turbo-0301), 1 (gpt-4-0314), 2 (gpt-3.5-turbo-1106), 3 (gpt-4-1106-preview)

# Examples
python ./zero-shot/compute_ade_fde_from_dump.py --dataset 0 --model 3
python ./zero-shot/compute_ade_fde_from_dump.py --dataset 1 --model 3

Results

<table><thead><tr><th rowspan="2">LMTraj-ZERO</th><th colspan="2">ETH</th><th colspan="2">HOTEL</th><th colspan="2">UNIV</th><th colspan="2">ZARA1</th><th colspan="2">ZARA2</th><th colspan="2">AVG</th></tr> <tr><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th></tr></thead><tbody> <tr><td>gpt-3.5-turbo-0301</td><td>1.0668</td><td>1.8241</td><td>0.4229</td><td>0.6538</td><td>0.5570</td><td>0.9836</td><td>0.4715</td><td>0.9073</td><td>0.3878</td><td>0.7056</td><td>0.5812</td><td>1.0149</td></tr> <tr><td>gpt-3.5-turbo-1106</td><td></td><td></td><td>0.4713</td><td>0.6297</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr> <tr><td>gpt-4-0314</td><td>0.7978</td><td>1.6446</td><td>0.2001</td><td>0.3658</td><td>0.3709</td><td>0.7675</td><td>0.3268</td><td>0.6638</td><td>0.2386</td><td>0.4998</td><td>0.3868</td><td>0.7883</td></tr> <tr><td>gpt-4-1106-preview</td><td></td><td></td><td>0.1757</td><td>0.3279</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr></tbody></table>

Evaluate Algorithmic Models

We provide four algorithmic models for comparison in zero-shot trajectory prediction task, available in ./zero-shot/algorithmic_model_benchmark.py. The source code supports four extrapolation methods: stop, linear extrapolation, cubic extrapolation and Kalman filter.

python ./zero-shot/algorithmic_model_benchmark.py --model <MODEL_TYPE>

# Examples
python ./zero-shot/algorithmic_model_benchmark.py --model stop
python ./zero-shot/algorithmic_model_benchmark.py --model linear
python ./zero-shot/algorithmic_model_benchmark.py --model cubic
python ./zero-shot/algorithmic_model_benchmark.py --model kalman

🔥 Supervised Training & Evaluation 🔥

Setup

Environment All models were tested on Ubuntu 20.04 with Python 3.10 and PyTorch 2.0.1 with CUDA 11.7. Dependencies include Python packages such as transformers, accelerate, nltk and sentencepiece.

Dataset Preprocessed ETH and UCY datasets are released in this repository. The train/validation/test splits are the same as those fond in Social-GAN.

Preliminary

We provide preprocessed datasets, pretrained tokenizers, and models for training and evaluation. Download these files and extract them into the root folder of the project. This will allow you to skip preprocessing and evaluate our LMTraj-SUP model immediately.

Additionally, we provide instructions for preprocessing and training the data yourself. Follow these steps:

Dataset Preprocessing To maximize GPU utilization and reduce training time, we preprocess the training data. First, generate text descriptions of the dataset environment using the image captioning model located at ./model/imagemodel.py. This script automatically loads the pretrained model and saves the captions in the ./datasets/image/ folder.

python ./model/imagemodel.py

Next, to preprocess all datasets simultaneously, run the ./script/preprocessor.sh script. This process takes about 2 hours and generates preprocessed JSON files in the ./datasets/preprocessed/ folder.

./script/preprocessor.sh

If you prefer to preprocess the datasets individually, use ./utils/preprocessor.py instead of the script.

python ./utils/preprocessor.py --dataset <DATASET_NAME> --phase <TRAINING_PHASE>

# Supported dataset name: eth, hotel, univ, zara1, zara2
# Supported training phase: train, val, test

# Examples
python ./utils/preprocessor.py --dataset eth --phase train
python ./utils/preprocessor.py --dataset hotel --phase val
python ./utils/preprocessor.py --dataset univ --phase test

Tokenizer Training Next, train the tokenizer to optimize it for numerical data. You can train the tokenizer yourself using /utils/tokenizer.py. This process requires a system with more than 2TB of RAM and takes approximately 12 hours for each.

python ./utils/tokenizer.py --dataset <DATASET_NAME> --model <TOKENIZER_MODEL> --metric <PIXEL_OR_METER>

# Supported dataset name: eth, hotel, univ, zara1, zara2
# Supported tokenizer model type: char, word, unigram, bpe
# Supported metric type: pixel, meter

# Examples
python ./utils/tokenizer.py --dataset eth --model bpe --metric pixel

Train LMTraj-SUP

To train our LMTrajectory model, you will use ./trainval.py. We leverage the accelerate library to maximize training efficiency. First, configure your system by running accelerate config in the shell. You can find detailed instructions in the Accelerate documentation.

To train the model, use the following command:

accelerate launch trainval.py \
    --cfg ./config/config-pixel.json \
    --dataset eth \
    --tag LMTraj-SUP-eth

If you want to train the LMTraj-SUP model on both the ETH and UCY datasets simultaneously, we provide a bash script:

./script/trainval_all.sh

The training process uses 8x NVIDIA RTX 4090 GPUs at 100% utilization and takes approximately 2 to 4 hours. After training, select the best weight file from the checkpoint epochs.

Evaluate LMTraj-SUP

Finally, to evaluate our LMTrajectory model, use ./trainval.py again with the --test tag. This will perform the evaluation. You can conduct both stochastic and deterministic trajectory predictions using a single pretrained weight file.

For stochastic trajectory prediction, use:

accelerate launch trainval.py \
    --cfg ./config/config-pixel.json \
    --dataset eth \
    --tag LMTraj-SUP-eth \ 
    --test

For deterministic trajectory prediction, use:

accelerate launch trainval.py \
    --cfg ./config/config-pixel-deterministic.json \
    --dataset eth \
    --tag LMTraj-SUP-eth \ 
    --test

To evaluate our LMTraj-SUP model on both the ETH and UCY datasets simultaneously, we provide the following bash scripts for a simplified execution:

./script/eval_all.sh
./script/eval_all_deterministic.sh

Results

<table><thead><tr><th rowspan="2">LMTraj-SUP</th><th colspan="2">ETH</th><th colspan="2">HOTEL</th><th colspan="2">UNIV</th><th colspan="2">ZARA1</th><th colspan="2">ZARA2</th><th colspan="2">AVG</th></tr> <tr><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th><th>ADE</th><th>FDE</th></tr></thead><tbody> <tr><td>Deterministic w/ image</td><td>0.6549</td><td>1.0377</td><td>0.2640</td><td>0.4583</td><td>0.5715</td><td>1.1579</td><td>0.5119</td><td>1.0066</td><td>0.3802</td><td>0.7408</td><td>0.4765</td><td>0.8803</td></tr> <tr><td>Deterministic w/o image</td><td>0.6724</td><td>1.2388</td><td>0.2498</td><td>0.4331</td><td>0.5723</td><td>1.1612</td><td>0.5090</td><td>1.0018</td><td>0.3827</td><td>0.7471</td><td>0.4772</td><td>0.9164</td></tr> <tr><td>Stochastic w/ image</td><td>0.4087</td><td>0.5011</td><td>0.1200</td><td>0.1558</td><td>0.2178</td><td>0.3440</td><td>0.1992</td><td>0.3183</td><td>0.1748</td><td>0.2720</td><td>0.2241</td><td>0.3182</td></tr> <tr><td>Stochastic w/o image</td><td>0.4106</td><td>0.6188</td><td>0.1212</td><td>0.1595</td><td>0.2188</td><td>0.3465</td><td>0.2018</td><td>0.3225</td><td>0.1756</td><td>0.2760</td><td>0.2256</td><td>0.3447</td></tr></tbody></table>

📖 Citation

If you find this code useful for your research, please cite our trajectory prediction papers :)

@inproceedings{bae2024lmtrajectory,
  title={Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction},
  author={Bae, Inhwan and Lee, Junoh and Jeon, Hae-Gon},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

<details open> <summary>More Information (Click to expand)</summary>

@inproceedings{bae2024singulartrajectory,
  title={SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model},
  author={Bae, Inhwan and Park, Young-Jae and Jeon, Hae-Gon},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

@inproceedings{bae2023eigentrajectory,
  title={EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting},
  author={Bae, Inhwan and Oh, Jean and Jeon, Hae-Gon},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

@article{bae2023graphtern,
  title={A Set of Control Points Conditioned Pedestrian Trajectory Prediction},
  author={Bae, Inhwan and Jeon, Hae-Gon},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2023}
}

@inproceedings{bae2022gpgraph,
  title={Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction},
  author={Bae, Inhwan and Park, Jin-Hwi and Jeon, Hae-Gon},
  booktitle={Proceedings of the European Conference on Computer Vision},
  year={2022}
}

@inproceedings{bae2022npsn,
  title={Non-Probability Sampling Network for Stochastic Human Trajectory Prediction},
  author={Bae, Inhwan and Park, Jin-Hwi and Jeon, Hae-Gon},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

@article{bae2021dmrgcn,
  title={Disentangled Multi-Relational Graph Convolutional Network for Pedestrian Trajectory Prediction},
  author={Bae, Inhwan and Jeon, Hae-Gon},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2021}
}

</details>