Home

Awesome

Maintenance PR's Welcome Awesome

<div align="center"> <img src="https://github.com/983632847/Awesome-Multimodal-Object-Tracking/blob/main/MMOT.png" width="600">

Awesome Multi-modal Object Tracking (MMOT)


<p align="center"> </p> </div>

Awesome Multi-modal Object Tracking (MMOT)

A continuously updated project to track the latest progress in multi-modal object tracking.

If this repository can bring you some inspiration, we would feel greatly honored.

If you like our project, please give us a star ⭐ on this GitHub.

If you have any suggestions, please feel free to contact: andyzhangchunhui@gmail.com.

We welcome other researchers to submit pull requests and become contributors to this project.

:collision: Highlights

Last Updated

Contents

Citation

If you find our work useful in your research, please consider citing:

@article{zhang2024awesome,
  title={Awesome Multi-modal Object Tracking},
  author={Zhang, Chunhui and Liu, Li and Wen, Hao and Zhou, Xi and Wang, Yanfeng},
  journal={arXiv preprint arXiv:2405.14200},
  year={2024}
}

Survey

Vision-Language Tracking

Datasets

DatasetPub. & DateWebSiteIntroduction
OTB99-LCVPR-2017OTB99-L99 videos
LaSOTCVPR-2019LaSOT1400 videos
LaSOT_EXTIJCV-2021LaSOT_EXT150 videos
TNL2KCVPR-2021TNL2K2000 videos
WebUAV-3MTPAMI-2023WebUAV-3M4500 videos, 3.3 million frames, UAV tracking, vision-language-audio
MGITNeurIPS-2023MGIT150 long video sequences, 2.03 million frames, three semantic grains (i.e., action, activity, and story)
VastTrackarXiv-2024VastTrack50,610 video sequences, 4.2 million frames, 2,115 classes
WebUOT-1MarXiv-2024WebUOT-1MThe first million-scale underwater object tracking dataset contains 1,500 video sequences, 1.1 million frames
ElysiumTrack-1MECCV-2024ElysiumTrack-1MA large-scale dataset that supports three tasks: single object tracking, reference single object tracking, and video reference expression generation, with 1.27 million videos
VLT-MIarXiv-2024-A dataset for multi-round, multi-modal interaction, with 3,619 videos.

Papers

2024

2023

2022

2021

RGBE Tracking

Datasets

DatasetPub. & DateWebSiteIntroduction
FE108ICCV-2021FE108108 event videos
COESOTarXiv-2022COESOT1354 RGB-event video pairs
VisEventTC-2023VisEvent820 RGB-event video pairs
EventVOTCVPR-2024EventVOT1141 event videos
CRSOTarXiv-2024CRSOT1030 RGB-event video pairs
FELTarXiv-2024FELT742 RGB-event video pairs
MEVDTarXiv-2024MEVDT63 multimodal sequences with 13k images, 5M events, 10k object labels and 85 trajectories

Papers

2024

2023

2022

2021

RGBD Tracking

Datasets

DatasetPub. & DateWebSiteIntroduction
PTBICCV-2013PTB100 sequences
STCTC-2018STC36 sequences
CDTBICCV-2019CDTB80 sequences
VOT-RGBD 2019/2020/2021ICCVW-2019VOT-RGBD 2019VOT-RGBD 2019, 2020, and 2021 are based on CDTB
DepthTrackICCV-2021DepthTrack200 sequences
VOT-RGBD 2022ECCVW-2022VOT-RGBD 2022VOT-RGBD 2022 is based on CDTB and DepthTrack
RGBD1KAAAI-2023RGBD1K1,050 sequences, 2.5M frames
DTTDCVPR Workshops-2023DTTD103 scenes, 55691 frames
ARKitTrackCVPR-2023ARKitTrack300 RGB-D sequences, 455 targets, 229.7K video frames

Papers

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

RGBT Tracking

Datasets

DatasetPub. & DateWebSiteIntroduction
GTOTTIP-2016GTOT50 video pairs, 1.5W frames
RGBT210ACM MM-2017RGBT210210 video pairs
RGBT234PR-2018RGBT234234 video pairs, the extension of RGBT210
LasHeRTIP-2021LasHeR1224 video pairs, 730K frames
VTUAVCVPR-2022VTUAVVisible-thermal UAV tracking, 500 sequences, 1.7 million high-resolution frame pairs
MV-RGBTarXiv-2024MV-RGBT122 video pairs, 89.9K frames

Papers

2024

2023

2022

2021

2020

2019

Miscellaneous

Datasets

DatasetPub. & DateWebSiteIntroduction
WebUAV-3MTPAMI-2023WebUAV-3M4500 videos, 3.3 million frames, UAV tracking, Vision-language-audio
UniMod1KIJCV-2024UniMod1K1050 video pairs, 2.5 million frames, Vision-depth-language

Papers

2024

2023

2022

Others

2024

Awesome Repositories for MMOT

License

This project is released under the MIT license. Please see the LICENSE file for more information.