Home

Awesome

GRM

The official PyTorch implementation of our CVPR 2023 paper:

Generalized Relation Modeling for Transformer Tracking

Shenyuan Gao, Chunluan Zhou, Jun Zhang

[CVF Open Access] [ArXiv Preprint] [YouTube Video] [Trained Models] [Raw Results] [SOTA Paper List]

Highlight

:bookmark:Brief Introduction

Compared with previous two-stream trackers, the recent one-stream tracking pipeline, which allows earlier interaction between the template and search region, has achieved a remarkable performance gain. However, existing one-stream trackers always let the template interact with all parts inside the search region throughout all the encoder layers. This could potentially lead to target-background confusion when the extracted feature representations are not sufficiently discriminative. To alleviate this issue, we propose generalized relation modeling (GRM) based on adaptive token division. The proposed method is a generalized formulation of attention-based relation modeling for Transformer tracking, which inherits the merits of both previous two-stream and one-stream pipelines whilst enabling more flexible relation modeling by selecting appropriate search tokens to interact with template tokens.

:bookmark:Strong Performance

VariantGRM-GOTGRMGRM-L320
Model ConfigViT-B, 256^2 resolutionViT-B, 256^2 resolutionViT-L, 320^2 resolution
Training Settingonly GOT, 100 epochs4 datasets, 300 epochs4 datasets, 300 epochs
GOT-10k (AO / SR 0.5 / SR 0.75)73.4 / 82.9 / 70.4--
LaSOT (AUC / Norm P / P)-69.9 / 79.3 / 75.871.4 / 81.2 / 77.9
TrackingNet (AUC / Norm P / P)-84.0 / 88.7 / 83.384.4 / 88.9 / 84.0
AVisT (AUC / OP50 / OP75)-54.5 / 63.1 / 45.255.1 / 63.8 / 46.9
NfS30 (AUC)-65.666.0
UAV123 (AUC)-70.272.2

:bookmark:Inference Speed

Our baseline model (backbone: ViT-B, resolution: 256x256) can run at 45 fps (frames per second) on a single NVIDIA GeForce RTX 3090.

:bookmark:Training Cost

It takes less than half a day to train our baseline model for 300 epochs on 8 NVIDIA GeForce RTX 3090 (each of which has 24GB GPU memory).

Release

Trained Models (including the baseline model GRM, GRM-GOT and a stronger variant GRM-L320) [download zip file]

Raw Results (including raw tracking results on six datasets we benchmarked in the paper and listed above) [download zip file]

Download and unzip these two zip files into the output directory under GRM project path, then both of them can be directly used by our code.

Let's Get Started

Acknowledgement

:heart::heart::heart:Our idea is implemented base on the following projects. We really appreciate their excellent open-source works!

Citation

If any parts of our paper and code help your research, please consider citing us and giving a star to our repository.

@inproceedings{gao2023generalized,
  title={Generalized Relation Modeling for Transformer Tracking},
  author={Gao, Shenyuan and Zhou, Chunluan and Zhang, Jun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18686--18695},
  year={2023}
}

Contact

If you have any questions or concerns, feel free to open issues or directly contact me through the ways on my GitHub homepage. Suggestions and collaborations are also highly welcome!