


The official implementation for the ICML 2024 paper [ Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking]

Models & Raw Results Baidu Driver: avtr Models & Raw Results Google Driver


<p align="center"> <img width="85%" src="assets/AVTrack.png" alt="AVTrack"/> </p>



Create and activate a conda environment:

conda create -n AVTrack python=3.8
conda activate AVTrack

Install the required packages:

pip install -r requirement.txt

Data Preparation

Put the tracking datasets in ./data. It should look like:

 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- images
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         |-- TRAIN_11
         |-- TEST

Path Setting

Run the following command to set paths:

cd <PATH_of_AVTrack>
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

You can also modify paths by these two files:

./lib/train/admin/local.py  # paths for training
./lib/test/evaluation/local.py  # paths for testing


Download pre-trained DeiT-Tiny weights, Eva02-Tiny weights , and ViT-Tiny weights and put it under `$USER_ROOT$/.cache/torch/hub/checkpoints/.

python tracking/train.py --script avtrack --config deit_tiny_patch16_224 --save_dir ./output --mode single


Download the model weights from Google Drive or BaiduNetDisk

Put the downloaded weights on <PATH_of_AVTrack>/output/checkpoints/train/avtrack/deit_tiny_patch16_224

Change the corresponding values of lib/test/evaluation/local.py to the actual benchmark saving paths

Testing examples:

python tracking/test.py avtrack deit_tiny_patch16_224 --dataset uavdt --threads 4 --num_gpus 1
python tracking/analysis_results.py # need to modify tracker configs and names

Test FLOPs

# Profiling AVTrack
python tracking/profile_model.py --script avtrack --config deit_tiny_patch16_224


Presentation Demo



If our work is useful for your research, please consider citing:

  title={Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking},
  author={Li, Yongxin and Liu, Mengyuan and Wu, You and Wang, Xucheng and Yang, Xiangyang and Li, Shuiwang},
  booktitle={Forty-first International Conference on Machine Learning}