Awesome
Track Everything Everywhere Fast and Robustly
Official Implementation for paper Track Everything Everywhere Fast and Robustly, ECCV 2024.
Yunzhou Song <sup>1*</sup>, Jiahui Lei <sup>1*</sup>, Ziyun Wang <sup>1</sup>, Lingjie Liu <sup>1</sup>, Kostas Daniilidis <sup>1,2</sup> <br> <sup>1</sup>University of Pennsylvania, <sup>2</sup>Archimedes, Athena RC, <sup>*</sup>equal contribution
Project Page | Paper
Installation
The code is tested with python=3.8
and torch=2.2.0+cu118
.
git clone --recurse-submodules https://github.com/TimSong412/OmniTrackFast
cd OmniTrackFast
conda create -n omni python=3.8
conda activate omni
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Training
-
Please refer to the preprocessing instructions for preparing input data for training. We also provide some processed data that you can download, unzip and directly train on.
-
With processed input data, run the following command to start training:
python train.py --config configs/default.txt --data_dir {sequence_directory}
You can also only train on flow supervision without long-term matching by
python train.py --config configs/nomatch.txt --datadir {sequence_directory}
You can view visualizations on tensorboard by running
tensorboard --logdir logs/
. By default, the script trains 100k iterations which takes less than 1 hour on a 2080Ti GPU.
Evaluation
- Please download the benchmark annotations used by OmniMotion here and our checkpoints form here.
Please unzip the benchmark directory into the
dataset
directory asdataset/tapvid_XXXX/annotations
- Run the following command to run checkpoint evaluation:
python run_eval.py
Implementation Details
- Any video sequence less than 20 frames should be expanded to over 20 or longer, otherwise the NVP blocks are too many to fit the sequence.
- We use
torch.jit
andtorch.autograd.Function
in the non-linear NVP blocks to manually accelerate the forward and backward process. You may disable thejit
configure if any runtime error occurs. - Generally, a video sequence of 100 frames consumes 5GB on a 2080Ti GPU for optimization.
Citation
@article{song2024track,
title={Track Everything Everywhere Fast and Robustly},
author={Song, Yunzhou and Lei, Jiahui and Wang, Ziyun and Liu, Lingjie and Daniilidis, Kostas},
journal={arXiv preprint arXiv:2403.17931},
year={2024}
}