Home

Awesome

Multi-source Templates Learning for Real-time Aerial Object Tracking

This is the official code for the paper "Multi-source Templates Learning for Real-time Aerial Object Tracking".In this work, we present an efficient Aerial Object Tracking method via Multi-source Templates named MSTL.

Highlights

Real-Time Speed on edge platform.

Our tracker can run ~200fps on GPU, ~100fps on CPU, and ~20 on Nvidia Jetson Xavier NX platform. After tensorRT to accelerate, the speed can reach , ~60fps on Jetson Xavier NX, ~19 fps on Jetson Nano.

Opposing to previous aerial trackers which evaluate on high-end platform(like Jetson AGX/Orin Series), the proposed tracker can run on extremely cheap edge platform: Jetson Nano and Jetson Xavier NX.

Competitive performance.

YearSpeed(fps)UAV123(Prec.)UAV123@10fpsUAV123(Prec.)UAV20LUAV123(Prec.)
Ours20982.3583.5083.59
TCTrackCVPR 202212880.0577.3967.20
HIFTICCV 202113778.7074.8776.32

Demo

demo_gif

Quick Start

Environment Preparing

python 3.7.3
pytorch 1.11.0
opencv-python 4.5.5.64

Training

First, you need to set paths for training datasets in lib/train/admin/local.py.

Then, run the following commands for training.

python lib/train/run_training.py

Evaluation

First, you need to set paths for this project in lib/test/evaluation/local.py.

Then, run the following commands for evaluation on four datasets.

python tracking/test.py MSTL MSTL --dataset uav
python tracking/test.py MSTL MSTL --dataset uavl
python tracking/test.py MSTL MSTL --dataset uav10
python tracking/test.py MSTL MSTL --dataset uavd

Trained model and Row results

The trained models, the training logs, and the raw tracking results are provided in the model zoo

MSTL framework for other transformer-based trackers.

To use our framework for other transformer-based trackers, we jointly trained the original tracker with an additional prediction head. The head takes outputs of the transformer encoder(or transformer-based structure) as inputs and predicts the bounding box of the target directly.

As an example, we use TransT(CVPR2021) with 4 feature integration layers (Each layer with 2 self-Attention and 2 Cross-Attention) to demonstrate how to implement the proposed decoupling strategy.

TransT

TrackerOriginal Succ. (UAV123)Original paramsSucc. (After decoupling)params(After decoupling)
TransT69.123.0M-16.7M
STARK-S68.3528.079M68.5518.616M

For further information, we will make the corresponding codes and pre-trained models available.

About the UAV platform

The platform mainly consists of four parts, i.e., a Pixhawk flight controller, a Figure Number transmissions, a visual camera and a Jetson Xavier NX onboard computer. The onboard computer can obtain the video flow through the USB port. The ground station computer can remotely access the onboard computer and select the target to be tracked through data transmission.

Hardware

Acknowledgement