Awesome
<div align="center"> <img src="https://github.com/983632847/Awesome-Multimodal-Object-Tracking/blob/main/MMOT.png" width="600">Awesome Multi-modal Object Tracking (MMOT)
<p align="center"> </p> </div>
Awesome Multi-modal Object Tracking (MMOT)
A continuously updated project to track the latest progress in multi-modal object tracking.
If this repository can bring you some inspiration, we would feel greatly honored.
If you like our project, please give us a star ⭐ on this GitHub.
If you have any suggestions, please feel free to contact: andyzhangchunhui@gmail.com.
We welcome other researchers to submit pull requests and become contributors to this project.
:collision: Highlights
- 2024.05.30: The Paper of WebUOT-1M was Online arXiv.
- 2024.05.24: The Report of Awesome MMOT Project was Online arXiv 知乎.
- 2024.05.20: Awesome MMOT Project Started.
Contents
- Survey
- Vision-Language Tracking (RGBL Tracking)
- RGBE Tracking
- RGBD Tracking
- RGBT Tracking
- Miscellaneous (RGB+X)
- Awesome Repositories for MMOT
Citation
If you find our work useful in your research, please consider citing:
@article{zhang2024awesome,
title={Awesome Multi-modal Object Tracking},
author={Zhang, Chunhui and Liu, Li and Wen, Hao and Zhou, Xi and Wang, Yanfeng},
journal={arXiv preprint arXiv:2405.14200},
year={2024}
}
Survey
-
Pengyu Zhang, Dong Wang, Huchuan Lu.<br /> "Multi-modal Visual Tracking: Review and Experimental Comparison." ArXiv (2022). [paper]
-
Zhangyong Tang, Tianyang Xu, Xiao-Jun Wu.<br /> "A Survey for Deep RGBT Tracking." ArXiv (2022). [paper]
-
Jinyu Yang, Zhe Li, Song Yan, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen, Ling Shao.<br /> "RGBD Object Tracking: An In-depth Review." ArXiv (2022). [paper]
-
Chenglong Li, Andong Lu, Lei Liu, Jin Tang.<br /> "Multi-modal visual tracking: a survey. 多模态视觉跟踪方法综述" Journal of Image and Graphics.中国图象图形学报 (2023). [paper]
-
Ou Zhou, Ying Ge, Zhang Dawei, and Zheng Zhonglong.<br /> "A Survey of RGB-Depth Object Tracking. RGB-D 目标跟踪综述" Journal of Computer-Aided Design & Computer Graphics. 计算机辅助设计与图形学学报 (2024). [paper]
-
Zhang, ZhiHao and Wang, Jun and Zang, Zhuli and Jin, Lei and Li, Shengjie and Wu, Hao and Zhao, Jian and Bo, Zhang.<br /> "Review and Analysis of RGBT Single Object Tracking Methods: A Fusion Perspective." ACM Transactions on Multimedia Computing, Communications and Applications (2024). [paper]
-
MV-RGBT & MoETrack: Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, He Wang, Pengcheng Shao, Chunyang Cheng, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler.<br /> "Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method." ArXiv (2024). [paper] [code]
-
Xingchen Zhang and Ping Ye and Henry Leung and Ke Gong and Gang Xiao.<br /> "Object fusion tracking based on visible and infrared images: A comprehensive review." Information Fusion (2024). [paper]
-
Mingzheng Feng and Jianbo Su.<br /> "RGBT tracking: A comprehensive review." Information Fusion (2024). [paper]
-
Zhang, Haiping and Yuan, Di and Shu, Xiu and Li, Zhihui and Liu, Qiao and Chang, Xiaojun and He, Zhenyu and Shi, Guangming.<br /> "A Comprehensive Review of RGBT Tracking." IEEE TIM (2024). [paper]
Vision-Language Tracking
Datasets
Dataset | Pub. & Date | WebSite | Introduction |
---|---|---|---|
OTB99-L | CVPR-2017 | OTB99-L | 99 videos |
LaSOT | CVPR-2019 | LaSOT | 1400 videos |
LaSOT_EXT | IJCV-2021 | LaSOT_EXT | 150 videos |
TNL2K | CVPR-2021 | TNL2K | 2000 videos |
WebUAV-3M | TPAMI-2023 | WebUAV-3M | 4500 videos, 3.3 million frames, UAV tracking, vision-language-audio |
MGIT | NeurIPS-2023 | MGIT | 150 long video sequences, 2.03 million frames, three semantic grains (i.e., action, activity, and story) |
VastTrack | arXiv-2024 | VastTrack | 50,610 video sequences, 4.2 million frames, 2,115 classes |
WebUOT-1M | arXiv-2024 | WebUOT-1M | The first million-scale underwater object tracking dataset contains 1,500 video sequences, 1.1 million frames |
ElysiumTrack-1M | ECCV-2024 | ElysiumTrack-1M | A large-scale dataset that supports three tasks: single object tracking, reference single object tracking, and video reference expression generation, with 1.27 million videos |
VLT-MI | arXiv-2024 | - | A dataset for multi-round, multi-modal interaction, with 3,619 videos. |
Papers
2024
-
MambaTrack: Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang.<br /> "MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking." ArXiv (2024). [paper]
-
DMTrack: Guangtong Zhang, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shuxiang Song.<br /> "Diffusion Mask-Driven Visual-language Tracking." IJCAI (2024). [paper]
-
SS-VLT: Jiawei Ge, Jiuxin Cao, Xuelin Zhu, Xinyu Zhang, Chang Liu, Kun Wang, Bo Liu.<br /> "Consistencies are All You Need for Semi-supervised Vision-Language Tracking." ACM MM (2024). [paper]
-
ALTracker: Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang.<br /> "Autogenic Language Embedding for Coherent Point Tracking." ACM MM (2024). [paper] [code]
-
Elysium: Han Wang, Yanjie Wang, Yongjie Ye, Yuxiang Nie, Can Huang.<br /> "Elysium: Exploring Object-level Perception in Videos via MLLM." ECCV (2024). [paper] [code]
-
Tapall.ai: Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng.<br /> "1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation." ArXiv (2024). [paper] [code]
-
DTLLM-VLT: Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang.<br /> "DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM." CVPRW (2024). [paper]
-
UVLTrack: Yinchao Ma, Yuyang Tang, Wenfei Yang, Tianzhu Zhang, Jinpeng Zhang, Mengxue Kang.<br /> "Unifying Visual and Vision-Language Tracking via Contrastive Learning." AAAI (2024). [paper] [code]
-
QueryNLT: Yanyan Shao, Shuting He, Qi Ye, Yuchao Feng, Wenhan Luo, Jiming Chen.<br /> "Context-Aware Integration of Language and Visual References for Natural Language Tracking." CVPR (2024). [paper] [code]
-
OSDT: Guangtong Zhang, Bineng Zhong, Qihua Liang, Zhiyi Mo, Ning Li, Shuxiang Song.<br /> "One-Stream Stepwise Decreasing for Vision-Language Tracking." TCSVT (2024). [paper]
-
TTCTrack: Zhongjie Mao; Yucheng Wang; Xi Chen; Jia Yan.<br /> "Textual Tokens Classification for Multi-Modal Alignment in Vision-Language Tracking." ICASSP (2024). [paper]
-
MMTrack: Zheng, Yaozong and Zhong, Bineng and Liang, Qihua and Li, Guorong and Ji, Rongrong and Li, Xianxian.<br /> "Toward Unified Token Learning for Vision-Language Tracking." TCSVT (2024). [paper] [code]
-
Ping Ye, Gang Xiao, Jun Liu .<br /> "Multimodal Features Alignment for Vision–Language Object Tracking." Remote Sensing (2024). [paper]
2023
-
All in One: Chunhui Zhang, Xin Sun, Li Liu, Yiqian Yang, Qiong Liu, Xi Zhou, Yanfeng Wang.<br /> "All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment." ACM MM (2023). [paper] [code]
-
CiteTracker: Xin Li, Yuqing Huang, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang.<br /> "CiteTracker: Correlating Image and Text for Visual Tracking." ICCV (2023). [paper] [code]
-
JointNLT: Li Zhou, Zikun Zhou, Kaige Mao, Zhenyu He.<br /> "Joint Visual Grounding and Tracking with Natural Language Specifcation." CVPR (2023). [paper] [code]
-
DecoupleTNL: Ma, Ding and Wu, Xiangqian.<br /> "Tracking by Natural Language Specification with Long Short-term Context Decoupling." ICCV (2023). [paper]
-
Haojie Zhao, Xiao Wang, Dong Wang, Huchuan Lu, Xiang Ruan.<br /> "Transformer vision-language tracking via proxy token guided cross-modal fusion." PRL (2023). [paper]
-
OVLM: Zhang, Huanlong and Wang, Jingchao and Zhang, Jianwei and Zhang, Tianzhu and Zhong, Bineng.<br /> "One-Stream Vision-Language Memory Network for Object Tracking." TMM (2023). [paper] [code]
-
SATracker: Jiawei Ge, Xiangmei Chen, Jiuxin Cao, Xuelin Zhu, Bo Liu.<br /> "Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking." ArXiv (2023). [paper]
-
VLATrack: Zuo, Jixiang and Wu, Tao and Shi, Meiping and Liu, Xueyan and Zhao, Xijun.<br /> "Multi-Modal Object Tracking with Vision-Language Adaptive Fusion and Alignment." RICAI (2023). [paper]
-
VLT_TT: Mingzhe Guo, Zhipeng Zhang, Liping Jing, Haibin Ling, Heng Fan.<br /> "Divert More Attention to Vision-Language Object Tracking." ArXiv (2023). [paper] [code]
2022
-
VLT_TT: Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing.<br /> "Divert More Attention to Vision-Language Tracking." NeurIPS (2022). [paper] [code]
-
AdaRS: Li, Yihao and Yu, Jun and Cai, Zhongpeng and Pan, Yuwen.<br /> "Cross-modal Target Retrieval for Tracking by Natural Language." CVPR Workshops (2022). [paper]
2021
- SNLT: Qi Feng, Vitaly Ablavsky, Qinxun Bai, Stan Sclaroff.<br /> "Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers." CVPR (2021). [paper] [code]
RGBE Tracking
Datasets
Dataset | Pub. & Date | WebSite | Introduction |
---|---|---|---|
FE108 | ICCV-2021 | FE108 | 108 event videos |
COESOT | arXiv-2022 | COESOT | 1354 RGB-event video pairs |
VisEvent | TC-2023 | VisEvent | 820 RGB-event video pairs |
EventVOT | CVPR-2024 | EventVOT | 1141 event videos |
CRSOT | arXiv-2024 | CRSOT | 1030 RGB-event video pairs |
FELT | arXiv-2024 | FELT | 742 RGB-event video pairs |
MEVDT | arXiv-2024 | MEVDT | 63 multimodal sequences with 13k images, 5M events, 10k object labels and 85 trajectories |
Papers
2024
-
FE-TAP: Jiaxiong Liu, Bo Wang, Zhen Tan, Jinpu Zhang, Hui Shen, Dewen Hu.<br /> "Tracking Any Point with Frame-Event Fusion Network at High Frame Rate." ArXiv (2024). [paper] [code]
-
MambaEVT: Xiao Wang, Chao wang, Shiao Wang, Xixi Wang, Zhicheng Zhao, Lin Zhu, Bo Jiang.<br /> "MambaEVT: Event Stream based Visual Object Tracking using State Space Model." ArXiv (2024). [paper] [code]
-
eMoE-Tracker: Yucheng Chen, Lin Wang.<br /> "eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking." ArXiv (2024). [paper] [code]
-
ED-DCFNet: Raz Ramon, Hadar Cohen-Duwek, Elishai Ezra Tsur.<br /> "ED-DCFNet: An Unsupervised Encoder-decoder Neural Model for Event-driven Feature Extraction and Object Tracking." CVPRW (2024). [paper] [code]
-
Mamba-FETrack: Ju Huang, Shiao Wang, Shuai Wang, Zhe Wu, Xiao Wang, Bo Jiang.<br /> "Mamba-FETrack: Frame-Event Tracking via State Space Model." ArXiv (2024). [paper] [code]
-
AMTTrack: Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo.<br /> "Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline." ArXiv (2024). [paper] [code]
-
TENet: Pengcheng Shao, Tianyang Xu, Zhangyong Tang, Linze Li, Xiao-Jun Wu, Josef Kittler.<br /> "TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking." ArXiv (2024). [paper] [code]
-
HDETrack: Xiao Wang, Shiao Wang, Chuanming Tang, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang.<br /> "Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline." CVPR (2024). [paper] [code]
-
Yabin Zhu, Xiao Wang, Chenglong Li, Bo Jiang, Lin Zhu, Zhixiang Huang, Yonghong Tian, Jin Tang.<br /> "CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras." ArXiv (2024). [paper] [code]
-
CDFI: Jiqing Zhang, Xin Yang, Yingkai Fu, Xiaopeng Wei, Baocai Yin, Bo Dong.<br /> "Object Tracking by Jointly Exploiting Frame and Event Domain." ArXiv (2024). [paper]
-
MMHT: Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan Cui, Dezhong Yao, Daqing Guo.<br /> "Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion." ArXiv (2024). [paper]
2023
-
Zhiyu Zhu, Junhui Hou, Dapeng Oliver Wu.<br /> "Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers." ICCV (2023). [paper] [code]
-
AFNet: Jiqing Zhang, Yuanchen Wang, Wenxi Liu, Meng Li, Jinpeng Bai, Baocai Yin, Xin Yang.<br /> "Frame-Event Alignment and Fusion Network for High Frame Rate Tracking." CVPR (2023). [paper] [code]
-
RT-MDNet: Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, Yaowei Wang, Yonghong Tian, Feng Wu.<br /> "VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows." TC (2023). [paper] [code]
2022
-
Event-tracking: Zhiyu Zhu, Junhui Hou, Xianqiang Lyu.<br /> "Learning Graph-embedded Key-event Back-tracing for Object Tracking in Event Clouds." NeurIPS (2022). [paper] [code]
-
STNet: Jiqing Zhang, Bo Dong, Haiwei Zhang, Jianchuan Ding, Felix Heide, Baocai Yin, Xin Yang.<br /> "Spiking Transformers for Event-based Single Object Tracking." CVPR (2022). [paper] [code]
-
CEUTrack: Chuanming Tang, Xiao Wang, Ju Huang, Bo Jiang, Lin Zhu, Jianlin Zhang, Yaowei Wang, Yonghong Tian.<br /> "Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric." ArXiv (2022). [paper] [code]
2021
- CFE: Jiqing Zhang, Kai Zhao, Bo Dong, Yingkai Fu, Yuxin Wang, Xin Yang, Baocai Yin.<br /> "Multi-domain Collaborative Feature Representation for Robust Visual Object Tracking." The Visual Computer (2021). [paper]
RGBD Tracking
Datasets
Dataset | Pub. & Date | WebSite | Introduction |
---|---|---|---|
PTB | ICCV-2013 | PTB | 100 sequences |
STC | TC-2018 | STC | 36 sequences |
CDTB | ICCV-2019 | CDTB | 80 sequences |
VOT-RGBD 2019/2020/2021 | ICCVW-2019 | VOT-RGBD 2019 | VOT-RGBD 2019, 2020, and 2021 are based on CDTB |
DepthTrack | ICCV-2021 | DepthTrack | 200 sequences |
VOT-RGBD 2022 | ECCVW-2022 | VOT-RGBD 2022 | VOT-RGBD 2022 is based on CDTB and DepthTrack |
RGBD1K | AAAI-2023 | RGBD1K | 1,050 sequences, 2.5M frames |
DTTD | CVPR Workshops-2023 | DTTD | 103 scenes, 55691 frames |
ARKitTrack | CVPR-2023 | ARKitTrack | 300 RGB-D sequences, 455 targets, 229.7K video frames |
Papers
2024
-
AMATrack: Ye, Ping and Xiao, Gang and Liu, Jun.<br /> "AMATrack: A Unified Network With Asymmetric Multimodal Mixed Attention for RGBD Tracking." IEEE TIM (2024). [paper]
-
SSLTrack: Xue-Feng Zhu, Tianyang Xu, Sara Atito, Muhammad Awais, Xiao-Jun Wu, Zhenhua Feng, Josef Kittler.<br /> "Self-supervised learning for RGB-D object tracking." PR (2024). [paper]
-
VADT: Zhang, Guangtong and Liang, Qihua and Mo, Zhiyi and Li, Ning and Zhong, Bineng.<br /> "Visual Adapt for RGBD Tracking." ICASSP (2024). [paper]
-
FECD: Xue-Feng Zhu, Tianyang Xu, Xiao-Jun Wu, Josef Kittler.<br /> "Feature enhancement and coarse-to-fine detection for RGB-D tracking." PRL (2024). [paper]
-
CDAAT: Xue-Feng Zhu, Tianyang Xu, Xiao-Jun Wu, Zhenhua Feng, Josef Kittler.<br /> "Adaptive Colour-Depth Aware Attention for RGB-D Object Tracking." SPL (2024). [paper] [code]
2023
-
SPT: Xue-Feng Zhu, Tianyang Xu, Zhangyong Tang, Zucheng Wu, Haodong Liu, Xiao Yang, Xiao-Jun Wu, Josef Kittler.<br /> "RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking." AAAI (2023). [paper] [code]
-
EMT: Yang, Jinyu and Gao, Shang and Li, Zhe and Zheng, Feng and Leonardis, Ale\v{s}.<br /> "Resource-Effcient RGBD Aerial Tracking." CVPR (2023). [paper] [code]
2022
-
Track-it-in-3D: Jinyu Yang, Zhongqun Zhang, Zhe Li, Hyung Jin Chang, Aleš Leonardis, Feng Zheng.<br /> "Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline." ECCV (2022). [paper] [code]
-
DMTracker: Shang Gao, Jinyu Yang, Zhe Li, Feng Zheng, Aleš Leonardis, Jingkuan Song.<br /> "Learning Dual-Fused Modality-Aware Representations for RGBD Tracking." ECCVW (2022). [paper]
2021
-
DeT: Song Yan, Jinyu Yang, Jani Käpylä, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen.<br /> "DepthTrack: Unveiling the Power of RGBD Tracking." ICCV (2021). [paper] [code]
-
TSDM: Pengyao Zhao, Quanli Liu, Wei Wang and Qiang Guo.<br /> "TSDM: Tracking by SiamRPN++ with a Depth-refiner and a Mask-generator." ICPR (2021). [paper] [code]
-
3s-RGBD: Feng Xiao, Qiuxia Wu, Han Huang.<br /> "Single-scale siamese network based RGB-D object tracking with adaptive bounding boxes." Neurocomputing (2021). [paper]
2020
-
DAL: Yanlin Qian, Alan Lukezic, Matej Kristan, Joni-Kristian Kämäräinen, Jiri Matas.<br /> "DAL : A deep depth-aware long-term tracker." ICPR (2020). [paper] [code]
-
RF-CFF: Yong Wang, Xian Wei, Hao Shen, Lu Ding, Jiuqing Wan.<br /> "Robust fusion for RGB-D tracking using CNN features." Applied Soft Computing Journal (2020). [paper]
-
SiamOC: Wenli Zhang, Kun Yang, Yitao Xin, Rui Meng.<br /> "An Occlusion-Aware RGB-D Visual Object Tracking Method Based on Siamese Network.." ICSP (2020). [paper]
-
WCO: Weichun Liu, Xiaoan Tang, Chengling Zhao.<br /> "Robust RGBD Tracking via Weighted Convlution Operators." Sensors (2020). [paper]
2019
-
OTR: Ugur Kart, Alan Lukezic, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas.<br /> "Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters." CVPR (2019). [paper] [code]
-
H-FCN: Ming-xin Jiang, Chao Deng, Jing-song Shan, Yuan-yuan Wang, Yin-jie Jia, Xing Sun.<br /> "Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking." Information Fusion (2019). [paper]
-
Kuai, Yangliu and Wen, Gongjian and Li, Dongdong and Xiao, Jingjing.<br /> "Target-Aware Correlation Filter Tracking in RGBD Videos." IEEE Sensors Journal (2019). [paper]
-
RGBD-OD: Yujun Xie, Yao Lu, Shuang Gu.<br /> "RGB-D Object Tracking with Occlusion Detection." CIS (2019). [paper]
-
3DMS: Alexander Gutev, Carl James Debono.<br /> "Exploiting Depth Information to Increase Object Tracking Robustness." ICST (2019). [paper]
-
CA3DMS: Ye Liu, Xiao-Yuan Jing, Jianhui Nie, Hao Gao, Jun Liu, Guo-Ping Jiang.<br /> "Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos." TMM (2019). [paper] [code]
-
Depth-CCF: Guanqun Li, Lei Huang, Peichang Zhang, Qiang Li, YongKai Huo.<br /> "Depth Information Aided Constrained correlation Filter for Visual Tracking." GSKI (2019). [paper]
2018
-
STC: Jingjing Xiao, Rustam Stolkin, Yuqing Gao, Aleš Leonardis.<br /> "Robust Fusion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio-Temporal Consistency Constraints." TC (2018). [paper] [code]
-
Kart, Uğur and Kämäräinen, Joni-Kristian and Matas, Jiří.<br /> "How to Make an RGBD Tracker ?." ECCVW (2018). [paper] [code]
-
Jiaxu Leng, Ying Liu.<br /> "Real-Time RGB-D Visual Tracking With Scale Estimation and Occlusion Handling." IEEE Access (2018). [paper]
-
DM-DCF: Uğur Kart, Joni-Kristian Kämäräinen, Jiří Matas, Lixin Fan, Francesco Cricri.<br /> "Depth Masked Discriminative Correlation Filter." ICPR (2018). [paper]
-
OACPF: Yayu Zhai, Ping Song, Zonglei Mou, Xiaoxiao Chen, Xiongjun Liu.<br /> "Occlusion-Aware Correlation Particle FilterTarget Tracking Based on RGBD Data." Access (2018). [paper]
-
RT-KCF: Han Zhang, Meng Cai, Jianxun Li.<br /> "A Real-time RGB-D tracker based on KCF." CCDC (2018). [paper]
2017
-
ODIOT: Wei-Long Zheng, Shan-Chun Shen, Bao-Liang Lu.<br /> "Online Depth Image-Based Object Tracking with Sparse Representation and Object Detection." Neural Process Letters (2017). [paper]
-
ROTSL: Zi-ang Ma, Zhi-yu Xiang.<br /> "Robust Object Tracking with RGBD-based Sparse Learning." ITEE (2017). [paper]
2016
-
DLS: Ning An, Xiao-Guang Zhao, Zeng-Guang Hou.<br /> "Online RGB-D Tracking via Detection-Learning-Segmentation." ICPR (2016). [paper]
-
DS-KCF_shape: Sion Hannuna, Massimo Camplani, Jake Hall, Majid Mirmehdi, Dima Damen, Tilo Burghardt, Adeline Paiement, Lili Tao.<br /> "DS-KCF: A Real-time Tracker for RGB-D Data." RTIP (2016). [paper] [code]
-
3D-T: Adel Bibi, Tianzhu Zhang, Bernard Ghanem.<br /> "3D Part-Based Sparse Tracker with Automatic Synchronization and Registration." CVPR (2016). [paper] [code]
-
OAPF: Kourosh Meshgia, Shin-ichi Maedaa, Shigeyuki Obaa, Henrik Skibbea, Yu-zhe Lia, Shin Ishii.<br /> "Occlusion Aware Particle Filter Tracker to Handle Complex and Persistent Occlusions." CVIU (2016). [paper]
2015
-
CDG: Huizhang Shi, Changxin Gao, Nong Sang.<br /> "Using Consistency of Depth Gradient to Improve Visual Tracking in RGB-D sequences." CAC (2015). [paper]
-
DS-KCF: Massimo Camplani, Sion Hannuna, Majid Mirmehdi, Dima Damen, Adeline Paiement, Lili Tao, Tilo Burghardt.<br /> "Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling." BMVC (2015). [paper] [code]
-
DOHR: Ping Ding, Yan Song.<br /> "Robust Object Tracking Using Color and Depth Images with a Depth Based Occlusion Handling and Recovery." FSKD (2015). [paper]
-
ISOD: Yan Chen, Yingju Shen, Xin Liu, Bineng Zhong.<br /> "3D Object Tracking via Image Sets and Depth-Based Occlusion Detection." SP (2015). [paper]
-
OL3DC: Bineng Zhong, Yingju Shen, Yan Chen, Weibo Xie, Zhen Cui, Hongbo Zhang, Duansheng Chen ,Tian Wang, Xin Liu, Shujuan Peng, Jin Gou, Jixiang Du, Jing Wang, Wenming Zheng.<br /> "Online Learning 3D Context for Robust Visual Tracking." Neurocomputing (2015). [paper]
2014
- MCBT: Qi Wang, Jianwu Fang, Yuan Yuan. Multi-Cue Based Tracking.<br /> "Multi-Cue Based Tracking." Neurocomputing (2014). [paper]
2013
- PT: Shuran Song, Jianxiong Xiao.<br /> "Tracking Revisited using RGBD Camera: Unified Benchmark and Baselines." ICCV (2013). [paper] [code]
2012
-
Matteo Munaro, Filippo Basso and Emanuele Menegatti .<br /> "Tracking people within groups with RGB-D data." IROS (2012). [paper]
-
AMCT: Germán Martín García, Dominik Alexander Klein, Jörg Stückler, Simone Frintrop, Armin B. Cremers.<br /> "Adaptive Multi-cue 3D Tracking of Arbitrary Objects." JDOS (2012). [paper]
RGBT Tracking
Datasets
Dataset | Pub. & Date | WebSite | Introduction |
---|---|---|---|
GTOT | TIP-2016 | GTOT | 50 video pairs, 1.5W frames |
RGBT210 | ACM MM-2017 | RGBT210 | 210 video pairs |
RGBT234 | PR-2018 | RGBT234 | 234 video pairs, the extension of RGBT210 |
LasHeR | TIP-2021 | LasHeR | 1224 video pairs, 730K frames |
VTUAV | CVPR-2022 | VTUAV | Visible-thermal UAV tracking, 500 sequences, 1.7 million high-resolution frame pairs |
MV-RGBT | arXiv-2024 | MV-RGBT | 122 video pairs, 89.9K frames |
Papers
2024
-
CFBT: Zhirong Zeng, Xiaotao Liu, Meng Sun, Hongyu Wang, Jing Liu.<br /> "Cross Fusion RGB-T Tracking with Bi-directional Adapter." ArXiv (2024). [paper]
-
MambaVT: Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu.<br /> "MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking." ArXiv (2024). [paper]
-
DFM: Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo.<br /> "RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba." ArXiv (2024). [paper]
-
SiamSEA: Zihan Zhuang, Mingfeng Yin, Qi Gao, Yong Lin, Xing Hong.<br /> "SiamSEA: Semantic-aware Enhancement and Associative-attention Dual-Modal Siamese Network for Robust RGBT Tracking." IEEE Access (2024). [paper]
-
VLCTrack: Wang, Jiahao and Liu, Fang and Jiao, Licheng and Gao, Yingjia and Wang, Hao and Li, Shuo and Li, Lingling and Chen, Puhua and Liu, Xu.<br /> "Visual and Language Collaborative Learning for RGBT Object Tracking." TCSVT (2024). [paper]
-
CAFormer: Yun Xiao, Jiacong Zhao, Andong Lu, Chenglong Li, Yin Lin, Bing Yin, Cong Liu.<br /> "Cross-modulated Attention Transformer for RGBT Tracking." ArXiv (2024). [paper]
-
Li, Kai, Lihua Cai, Guangjian He, and Xun Gong.<br /> "MATI: Multimodal Adaptive Tracking Integrator for Robust Visual Object Tracking." Sensors (2024). [paper]
-
PDAT: Qiao Li, Kanlun Tan, Qiao Liu, Di Yuan, Xin Li, Yunpeng Liu.<br /> "Progressive Domain Adaptation for Thermal Infrared Object Tracking." ArXiv (2024). [paper]
-
ReFocus: Lai, Simiao and Liu, Chang and Wang, Dong and Lu, Huchuan.<br /> "Refocus the Attention for Parameter-Efficient Thermal Infrared Object Tracking." TNNLS (2024). [paper]
-
MMSTC: Zhang, Tianlu and Jiao, Qiang and Zhang, Qiang and Han, Jungong.<br /> "Exploring Multi-modal Spatial-Temporal Contexts for High-performance RGB-T Tracking." TIP (2024). [paper]
-
MELT: Zhangyong Tang, Tianyang Xu, Xiao-Jun Wu, and Josef Kittler.<br /> "Multi-Level Fusion for Robust RGBT Tracking via Enhanced Thermal Representation." ACM TOMM (2024). [paper] [code]
-
NLMTrack: Miao Yan, Ping Zhang, Haofei Zhang, Ruqian Hao, Juanxiu Liu, Xiaoyang Wang, Lin Liu.<br /> "Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation." ArXiv (2024). [paper] [code]
-
Yang Luo, Xiqing Guo, Hao Li.<br /> "From Two-Stream to One-Stream: Efficient RGB-T Tracking via Mutual Prompt Learning and Knowledge Distillation." ArXiv (2024). [paper]
-
Zhao, Qian, Jun Liu, Junjia Wang, and Xingzhong Xiong.<br /> "Real-Time RGBT Target Tracking Based on Attention Mechanism." Electronics (2024). [paper]
-
MIGTD: Yujue Cai, Xiubao Sui, Guohua Gu, Qian Chen.<br /> "Multi-modal interaction with token division strategy for RGB-T tracking." PR (2024). [paper]
-
GMMT: Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Xiao-Jun Wu, Josef Kittler.<br /> "Generative-based Fusion Mechanism for Multi-Modal Tracking." AAAI (2024). [paper] [code]
-
BAT: Bing Cao, Junliang Guo, Pengfei Zhu, Qinghua Hu.<br /> "Bi-directional Adapter for Multi-modal Tracking." AAAI (2024). [paper] [code]
-
ProFormer: Yabin Zhu, Chenglong Li, Xiao Wang, Jin Tang, Zhixiang Huang.<br /> "RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning." TCSVT (2024). [paper]
-
QueryTrack: Fan, Huijie and Yu, Zhencheng and Wang, Qiang and Fan, Baojie and Tang, Yandong.<br /> "QueryTrack: Joint-Modality Query Fusion Network for RGBT Tracking." TIP (2024). [paper]
-
CAT++: Liu, Lei and Li, Chenglong and Xiao, Yun and Ruan, Rui and Fan, Minghao.<br /> "RGBT Tracking via Challenge-Based Appearance Disentanglement and Interaction." TIP (2024). [paper]
-
TATrack: Hongyu Wang, Xiaotao Liu, Yifan Li, Meng Sun, Dian Yuan, Jing Liu.<br /> "Temporal Adaptive RGBT Tracking with Modality Prompt." ArXiv (2024). [paper]
-
MArMOT: Chenglong Li, Tianhao Zhu, Lei Liu, Xiaonan Si, Zilin Fan, Sulan Zhai.<br /> "Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark." ArXiv (2024). [paper]
-
AMNet: Zhang, Tianlu and He, Xiaoyi and Jiao, Qiang and Zhang, Qiang and Han, Jungong.<br /> "AMNet: Learning to Align Multi-modality for RGB-T Tracking." TCSVT (2024). [paper]
-
MCTrack: Hu, Xiantao and Zhong, Bineng and Liang, Qihua and Zhang, Shengping and Li, Ning and Li, Xianxian.<br /> "Towards Modalities Correlation for RGB-T Tracking." TCSVT (2024). [paper]
-
AFter: Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo.<br /> "AFter: Attention-based Fusion Router for RGBT Tracking." ArXiv (2024). [paper] [code]
-
CSTNet: Yunfeng Li, Bo Wang, Ye Li, Zhiwen Yu, Liang Wang.<br /> "Transformer-based RGB-T Tracking with Channel and Spatial Feature Fusion." ArXiv (2024). [paper] [code]
2023
-
TBSI: Hui, Tianrui and Xun, Zizheng and Peng, Fengguang and Huang, Junshi and Wei, Xiaoming and Wei, Xiaolin and Dai, Jiao and Han, Jizhong and Liu, Si.<br /> "Bridging Search Region Interaction with Template for RGB-T Tracking." CVPR (2023). [paper] [code]
-
DFNet: Jingchao Peng , Haitao Zhao , and Zhengwei Hu.<br /> "Dynamic Fusion Network for RGBT Tracking." TITS (2023). [paper] [code]
-
CMD: Zhang, Tianlu and Guo, Hongyuan and Jiao, Qiang and Zhang, Qiang and Han, Jungong.<br /> "Efficient RGB-T Tracking via Cross-Modality Distillation." CVPR (2023). [paper]
-
DFAT: Zhangyong Tang, Tianyang Xu, Hui Li, Xiao-Jun Wu, XueFeng Zhu, Josef Kittler.<br /> "Exploring fusion strategies for accurate RGBT visual object tracking." Information Fusion (2023). [paper] [code]
-
QAT: Lei Liu, Chenglong Li, Yun Xiao, Jin Tang.<br /> "Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance." ACM MM (2023). [paper]
-
GuideFuse: Zhang, Zeyang and Li, Hui and Xu, Tianyang and Wu, Xiao-Jun and Fu, Yu.<br /> "GuideFuse: A Novel Guided Auto-Encoder Fusion Network for Infrared and Visible Images." TIM (2023). [paper]
-
MPLT: Yang Luo, Xiqing Guo, Hui Feng, Lei Ao.<br /> "RGB-T Tracking via Multi-Modal Mutual Prompt Learning." ArXiv (2023). [paper] [code]
2022
-
HMFT: Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, Xiang Ruan.<br /> "Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline." CVPR (2022). [paper] [code]
-
MFGNet: Xiao Wang, Xiujun Shu, Shiliang Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, Feng Wu.<br /> "MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking." TMM (2022). [paper] [code]
-
MBAFNet: Li, Yadong and Lai, Huicheng and Wang, Liejun and Jia, Zhenhong.<br /> "Multibranch Adaptive Fusion Network for RGBT Tracking." IEEE Sensors Journal (2022). [paper]
-
AGMINet: Mei, Jiatian and Liu, Yanyu and Wang, Changcheng and Zhou, Dongming and Nie, Rencan and Cao, Jinde.<br /> "Asymmetric Global–Local Mutual Integration Network for RGBT Tracking." TIM (2022). [paper]
-
APFNet: Yun Xiao, Mengmeng Yang, Chenglong Li, Lei Liu, Jin Tang.<br /> "Attribute-Based Progressive Fusion Network for RGBT Tracking." AAAI (2022). [paper] [code]
-
DMCNet: Lu, Andong and Qian, Cun and Li, Chenglong and Tang, Jin and Wang, Liang.<br /> "Duality-Gated Mutual Condition Network for RGBT Tracking." TNNLS (2022). [paper]
-
TFNet: Zhu, Yabin and Li, Chenglong and Tang, Jin and Luo, Bin and Wang, Liang.<br /> "RGBT Tracking by Trident Fusion Network." TCSVT (2022). [paper]
-
Mingzheng Feng, Jianbo Su .<br /> "Learning reliable modal weight with transformer for robust RGBT tracking." KBS (2022). [paper]
2021
-
JMMAC: Zhang, Pengyu and Zhao, Jie and Bo, Chunjuan and Wang, Dong and Lu, Huchuan and Yang, Xiaoyun.<br /> "Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking." TIP (2021). [paper] [code]
-
ADRNet: Pengyu Zhang, Dong Wang, Huchuan Lu, Xiaoyun Yang.<br /> "Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking." IJCV (2021). [paper] [code]
-
SiamCDA: Zhang, Tianlu and Liu, Xueru and Zhang, Qiang and Han, Jungong.<br /> "SiamCDA: Complementarity-and distractor-aware RGB-T tracking based on Siamese network." TCSVT (2021). [paper] [code]
-
Wang, Yong and Wei, Xian and Tang, Xuan and Shen, Hao and Zhang, Huanlong.<br /> "Adaptive Fusion CNN Features for RGBT Object Tracking." TITS (2021). [paper]
-
M5L: Zhengzheng Tu, Chun Lin, Chenglong Li, Jin Tang, Bin Luo.<br /> "M5L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking." TIP (2021). [paper]
-
CBPNet: Qin Xu, Yiming Mei, Jinpei Liu, and Chenglong Li.<br /> "Multimodal Cross-Layer Bilinear Pooling for RGBT Tracking." TMM (2021). [paper]
-
MANet++: Andong Lu, Chenglong Li, Yuqing Yan, Jin Tang, Bin Luo.<br /> "RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss." TIP (2021). [paper]
-
CMR: Li, Chenglong and Xiang, Zhiqiang and Tang, Jin and Luo, Bin and Wang, Futian.<br /> "RGBT Tracking via Noise-Robust Cross-Modal Ranking." TNNLS (2021). [paper]
-
GCMP: Rui Yang, Xiao Wang, Chenglong Li, Jinmin Hu, Jin Tang.<br /> "RGBT tracking via cross-modality message passing." Neurocomputing (2021). [paper]
-
HDINet: Mei, Jiatian and Zhou, Dongming and Cao, Jinde and Nie, Rencan and Guo, Yanbu.<br /> "HDINet: Hierarchical Dual-Sensor Interaction Network for RGBT Tracking." IEEE Sensors Journal (2021). [paper]
2020
-
CMPP: Chaoqun Wang, Chunyan Xu, Zhen Cui, Ling Zhou, Tong Zhang, Xiaoya Zhang, Jian Yang.<br /> "Cross-Modal Pattern-Propagation for RGB-T Tracking."CVPR (2020). [paper]
-
CAT: Chenglong Li, Lei Liu, Andong Lu, Qing Ji, Jin Tang.<br /> "Challenge-Aware RGBT Tracking." ECCV (2020). [paper]
-
FANet: Yabin Zhu, Chenglong Li, Bin Luo, Jin Tang .<br /> "FANet: Quality-Aware Feature Aggregation Network for Robust RGB-T Tracking." TIV (2020). [paper]
2019
-
mfDiMP: Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, Fahad Shahbaz Khan.<br /> "Multi-Modal Fusion for End-to-End RGB-T Tracking." ICCVW (2019). [paper] [code]
-
DAPNet: Yabin Zhu, Chenglong Li, Bin Luo, Jin Tang, Xiao Wang.<br /> "Dense Feature Aggregation and Pruning for RGBT Tracking." ACM MM (2019). [paper]
-
DAFNet: Yuan Gao, Chenglong Li, Yabin Zhu, Jin Tang, Tao He, Futian Wang.<br /> "Deep Adaptive Fusion Network for High Performance RGBT Tracking." ICCVW (2019). [paper] [code]
-
MANet: Chenglong Li, Andong Lu, Aihua Zheng, Zhengzheng Tu, Jin Tang.<br /> "Multi-Adapter RGBT Tracking." ICCV (2019). [paper] [code]
Miscellaneous
Datasets
Dataset | Pub. & Date | WebSite | Introduction |
---|---|---|---|
WebUAV-3M | TPAMI-2023 | WebUAV-3M | 4500 videos, 3.3 million frames, UAV tracking, Vision-language-audio |
UniMod1K | IJCV-2024 | UniMod1K | 1050 video pairs, 2.5 million frames, Vision-depth-language |
Papers
2024
-
MixRGBX: Meng Sun and Xiaotao Liu and Hongyu Wang and Jing Liu.<br /> "MixRGBX: Universal multi-modal tracking with symmetric mixed attention." Neurocomputing (2024). [paper]
-
XTrack: Yuedong Tan, Zongwei Wu, Yuqian Fu, Zhuyun Zhou, Guolei Sun, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte.<br /> "Towards a Generalist and Blind RGB-X Tracker." ArXiv (2024). [paper] [code]
-
OneTracker: Lingyi Hong, Shilin Yan, Renrui Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kaixun Jiang, Yiting Chen, Jinglun Li, Zhaoyu Chen, Wenqiang Zhang.<br /> "OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning." CVPR (2024). [paper]
-
SDSTrack: Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong Liu.<br /> "SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking." CVPR (2024). [paper] [code]
-
Un-Track: Zongwei Wu, Jilai Zheng, Xiangxuan Ren, Florin-Alexandru Vasluianu, Chao Ma, Danda Pani Paudel, Luc Van Gool, Radu Timofte.<br /> "Single-Model and Any-Modality for Video Object Tracking." CVPR (2024). [paper] [code]
-
ELTrack: Alansari, Mohamad and Alnuaimi, Khaled and Alansari, Sara and Werghi, Naoufel and Javed, Sajid.<br /> "ELTrack: Correlating Events and Language for Visual Tracking." ArXiv (2024). [paper] [code]
-
KSTrack: He, Yuhang and Ma, Zhiheng and Wei, Xing and Gong, Yihong.<br /> "Knowledge Synergy Learning for Multi-Modal Tracking." TCSVT (2024). [paper]
-
SeqTrackv2: Xin Chen, Ben Kang, Jiawen Zhu, Dong Wang, Houwen Peng, Huchuan Lu.<br /> "Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking." ArXiv (2024). [paper] [code]
2023
- ViPT: Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu.<br /> "Visual Prompt Multi-Modal Tracking." CVPR (2023). [paper] [code]
2022
- ProTrack: Jinyu Yang, Zhe Li, Feng Zheng, Aleš Leonardis, Jingkuan Song.<br /> "Prompting for Multi-Modal Tracking." ACM MM (2022). [paper]
Others
2024
-
BihoT: Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du.<br /> "BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking." ArXiv (2024). [paper]
-
SCANet: Yunfeng Li, Bo Wang, Jiuran Sun, Xueyi Wu, Ye Li.<br /> "RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker." ArXiv (2024). [paper] [code]
Awesome Repositories for MMOT
- Vision-Language_Tracking_Paper_List
- VisEvent_SOT_Benchmark
- RGBD-tracking-review
- Datasets-and-benchmark-code
- RGBT-Tracking-Results-Datasets-and-Methods
- Multimodal-Tracking-Survey
License
This project is released under the MIT license. Please see the LICENSE file for more information.