

Visual Prompt Multi-Modal Tracking [CVPR2023]

Official implementation of ViPT, including models and training&testing codes.

Models & Raw Results (Google Driver) Models & Raw Results (Baidu Driver: vipt)

<center><img width="75%" alt="" src="assets/abs.png"/></center>

:fire::fire::fire: This work proposes ViPT, a new prompt-tuning framework for multi-modal tracking.


[Mar 20, 2023]

[Feb 28, 2023]


<center><img width="90%" alt="" src="assets/framework.png"/></center>


On RGB-D tracking benchmarks

<center><img width="90%" alt="" src="assets/results_rgbd.PNG"/></center>

On RGB-T tracking benchmarks

<center><img width="90%" alt="" src="assets/results_lasher.png"/></center> <center><img width="90%" alt="" src="assets/results_rgbt234.png"/></center>

On RGB-E tracking benchmark

<center><img width="90%" alt="" src="assets/results_rgbe.png"/></center>



Create and activate a conda environment:

conda create -n vipt python=3.7
conda activate vipt

Install the required packages:

bash install_vipt.sh

Data Preparation

Put the training datasets in ./data/. It should look like:

-- data
    -- DepthTrackTraining
        |-- adapter02_indoor
        |-- bag03_indoor
        |-- bag04_indoor
    -- LasHeR/train/trainingset
        |-- 1boygo
        |-- 1handsth
    -- VisEvent/train
        |-- 00142_tank_outdoor2
        |-- 00143_tank_outdoor2
        |-- trainlist.txt

Path Setting

Run the following command to set paths:

cd <PATH_of_ViPT>
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

You can also modify paths by these two files:

./lib/train/admin/local.py  # paths for training
./lib/test/evaluation/local.py  # paths for testing


Dowmload the pretrained foundation model (OSTrack) and put it under ./pretrained/.

bash train_vipt.sh

You can train models with various modalities and variants by modifying train_vipt.sh.


For RGB-D benchmarks

[DepthTrack Test set & VOT22_RGBD]
These two benchmarks are evaluated using VOT-toolkit.
You need to put the DepthTrack test set to./Depthtrack_workspace/ and name it 'sequences'.
You need to download the corresponding test sequences at./vot22_RGBD_workspace/.

bash eval_rgbd.sh

For RGB-T benchmarks

[LasHeR & RGBT234]
Modify the <DATASET_PATH> and <SAVE_PATH> in./RGBT_workspace/test_rgbt_mgpus.py, then run:

bash eval_rgbt.sh

We refer you to LasHeR Toolkit for LasHeR evaluation, and refer you to MPR_MSR_Evaluation for RGBT234 evaluation.

For RGB-E benchmark

Modify the <DATASET_PATH> and <SAVE_PATH> in./RGBE_workspace/test_rgbe_mgpus.py, then run:

bash eval_rgbe.sh

We refer you to VisEvent_SOT_Benchmark for evaluation.


If you find ViPT is helpful for your research, please consider citing:

  title={Visual Prompt Multi-Modal Tracking},
  author={Jiawen, Zhu and Simiao, lai and Xin, Chen and Wang, Dong and Lu, Huchuan},



If you have any question, feel free to email jiawen@mail.dlut.edu.cn. ^_^