Home

Awesome

A Large-scale High-diversity Benchmark for RGBT Tracking

LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking

Motivation

RGBT tracking receives a surge of interest in the computer vision community, but this research field lacks a large-scale and high diversity benchmark dataset, which is essential for both the training of deep RGBT trackers and the comprehensive evaluation of RGBT tracking methods. To this end, we present a Large-scale High-diversity benchmark for RGBT tracking (LasHeR) in this work.

image

About LasHeR benchmark

LasHeR consists of 1224 visible and thermal infrared video pairs with more than 730K frame pairs in total. Each frame pair is spatially aligned and manually annotated with a bounding box, making the dataset well and densely annotated. LasHeR is highly diverse capturing from a broad range of object categories, camera viewpoints, scene complexities and environmental factors across seasons, weathers, day and night. Induced by real-world applications, several new challenges are take into consideration in data creation.

Comparison of LasHeR with public RGBT datasets

Attributes

AttrDescription
NONo Occlusion - the target is not occluded.
POPartial Occlusion - the target object is partially occluded.
TOTotal Occlusion - the target object is totally occluded.
HOHyaline Occlusion - the target is occluded by hyaline object.
OVOut-of-View - the target leaves the camera field of view.
LILow Illumination - the illumination in the target region is low.
HIHigh Illumination - the illumination in the target is too strong to identify the target.
AIVAbrupt Illumination Variation - the illumination of the target changes significantly.
LRLow Resolution - the resolution in the target region is low.
DEFDeformation - non-rigid object deformation.
BCBackground Clutter - the background information which includes the target object is messy.
SASimilar Appearance - there are objects of similar shape near the target.
TCThermal Crossover - the target has similar temperature with other objects or background surroundings.
MBMotion Blur - the target object motion results in the blur image information.
CMCamera Moving - the target object is captured by moving camera.
FLFrame Lost - some of thermal or visible frames are lost.
FMFast Motion - the motion of the ground truth between two adjacent frames is larger than 20 pixels.
SVScale Variation - the ratio of the first bounding box and the current bounding box is out of the range [0.5,2].
ARCAspect Ratio Change - the ratio of bounding box aspect is outside the range [0.5,2].

Dataset file structure

sequence
├─infrared
│────i000.jpg
│────i001.jpg
│────i002.jpg

├─visible
│────v000.jpg
│────v001.jpg
│────v002.jpg

├─infrared.txt
├─init.txt
└─visible.txt

Evaluation on LasHeR

We evaluate 12 RGBT tracking algorithms on the entire LasHeR to provide comprehensive platform of performance analysis. Deep RGBT trackers include MANet, DAPNet, MaCNet, DAFNet, FANet, MANet++, DMCNet and mfDiMP. RGBT trackers based on handcrafted features include SGT, CMR and SGT++.

Retraining experiment on LasHeR

we split LasHeR into training and testing subsets according to the target class distribution. And we conduct the retraining experiment by retraining MANet and mfDiMP on training set to demonstrate how deep RGBT trackers can be improved using a large-scale training set.

Dataset