Awesome

The First High Definition (HD) Event based Visual Object Tracking Benchmark Dataset

• <a href="">arXiv</a> • <a href="">Baselines</a> • <a href="">DemoVideo</a> • <a href="">Tutorial</a> • </div>

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline. Xiao Wang, Shiao Wang, Chuanming Tang, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang (2023). arXiv preprint arXiv:2309.14611. [Paper] [Code] [DemoVideo]

:dart: Abstract

Tracking using bio-inspired event cameras draws more and more attention in recent years. Existing works either utilize aligned RGB and event data for accurate tracking or directly learn an event-based tracker. The first category needs more cost for inference and the second one may be easily influenced by noisy events or sparse spatial resolution. In this paper, we propose a novel hierarchical knowledge distillation framework that can fully utilize multi-modal / multi-view information during training to facilitate knowledge transfer, enabling us to achieve high-speed and low-latency visual tracking during testing by using only event signals. Specifically, a teacher Transformer based multi-modal tracking framework is first trained by feeding the RGB frame and event stream simultaneously. Then, we design a new hierarchical knowledge distillation strategy which includes pairwise similarity, feature representation and response maps based knowledge distillation to guide the learning of the student Transformer network. Moreover, since existing event-based tracking datasets are all low-resolution ($346 \times 260$), we propose the first large-scale high-resolution ($1280 \times 720$) dataset named EventVOT. It contains 1141 videos and covers a wide range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc. Extensive experiments on both low-resolution (FE240hz, VisEvent, COESOT), and our newly proposed high-resolution EventVOT dataset fully validated the effectiveness of our proposed method.

:collision: Update Log

:fire: [2024.03.12] A New Long-term RGB-Event based Visual Object Tracking Benchmark Dataset (termed FELT) is available at [Paper] [Code] [DemoVideo]
:fire: [2024.02.28] Our code, visualizations and other experimental results have been updated.
:fire: [2024.02.27] Our work is accepted by CVPR-2024!
:fire: [2023.12.04] EventVOT_eval_toolkit, from EventVOT_eval_toolki (Passcode：wsad)
:fire: [2023.09.26] arXiv paper, dataset, pre-trained models, and benchmark results are all released [arXiv]

:video_camera: Demo Video

A demo video Youtube can be found by clicking the image below:

:hammer: Environment

A distillation framework for Event Stream-based Visual Object Tracking.

[HDETrack_S_ep0050.pth] Passcode：wsad

[Raw Results] Passcode：wsad

Install env

conda create -n hdetrack python=3.8
conda activate hdetrack
bash install.sh

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Then, put the tracking datasets EventVOT in ./data.

Download pre-trained MAE ViT-Base weights and put it under $/pretrained_models

Download teacher pre-trained CEUTrack_ep0050.pth and put it under $/pretrained_models

Download the trained model weights from [HDETrack_S_ep0050.pth] and put it under $/output/checkpoints/train/hdetrack/hdetrack_eventvot for test directly.

You can also access Weight files in Dropbox to download these weights files.

Train & Test

# train
python tracking/train.py --script hdetrack --config hdetrack_eventvot --save_dir ./output --mode single --nproc_per_node 1 --use_wandb 0

# test
python tracking/test.py hdetrack hdetrack_eventvot --dataset eventvot --threads 1 --num_gpus 1

Test FLOPs, and Speed

Note: The speeds reported in our paper were tested on a single RTX 3090 GPU.

:dvd: EventVOT Dataset

Event Image version (train.zip 28.16GB, val.zip 703M, test.zip 9.94GB)

:floppy_disk: Baidu Netdisk: link：https://pan.baidu.com/s/1NLSnczJ8gnHqF-69bE7Ldg?pwd=wsad code：wsad

Complete version (Event Image + Raw Event data, train.zip 180.7GB, val.zip 4.34GB, test.zip 64.88GB)

:floppy_disk: Baidu Netdisk: link：https://pan.baidu.com/s/1ZTX7O5gWlAdpKmd4R9VhYA?pwd=wsad code：wsad

:floppy_disk: Dropbox: https://www.dropbox.com/scl/fo/fv2e3i0ytrjt14ylz81dx/h?rlkey=6c2wk2z7phmbiwqpfhhe29i5p&dl=0

If you want to download the dataset directly on the Ubuntu terminal using a script, please try this:

wget -O EventVOT_dataset.zip https://www.dropbox.com/scl/fo/fv2e3i0ytrjt14ylz81dx/h?rlkey=6c2wk2z7phmbiwqpfhhe29i5p"&"dl=1

The directory should have the below format:

├── EventVOT dataset
    ├── Training Subset (841 videos, 180.7GB)
        ├── recording_2022-10-10_17-28-38
            ├── img
            ├── recording_2022-10-10_17-28-38.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ... 
    ├── Testing Subset (282 videos, 64.88GB)
        ├── recording_2022-10-10_17-28-24
            ├── img
            ├── recording_2022-10-10_17-28-24.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ...
    ├── validating Subset (18 videos, 4.34GB)
        ├── recording_2022-10-10_17-31-07
            ├── img
            ├── recording_2022-10-10_17-31-07.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ...

Normally, we only need the "img" and "..._voxel" files from the EventVOT dataset for training. During testing, we only input "img" for inference. As shown in the following figure,

Note: Our EventVOT dataset is an unimodal Event Dataset, if you need a multimodal RGB-E dataset, please refer to [COESOT]， [VisEvent], or [FELT].

:triangular_ruler: Evaluation Toolkit

Download the EventVOT_eval_toolkit from EventVOT_eval_toolki (Passcode：wsad), and open it with Matlab (over Matlab R2020).
add your tracking results and baseline results (Passcode：wsad) in $/eventvot_tracking_results/ and modify the name in $/utils/config_tracker.m
run Evaluate_EventVOT_benchmark_SP_PR_only.m for the overall performance evaluation, including SR, PR, NPR.
run plot_BOC.m for BOC score evaluation and figure plot.
run plot_radar.m for attributes radar figrue plot.
run Evaluate_EventVOT_benchmark_attributes.m for attributes analysis and figure saved in $/res_fig/.

:chart_with_upwards_trend: Benchmark Results

The overall performance evaluation, including SR, PR, NPR.

:cupid: Acknowledgement

Thanks for the CEUTrack, OSTrack, PyTracking and ViT library for a quickly implement.

:newspaper: Citation

@inproceedings{wang2024event,
  title={Event stream-based visual object tracking: A high-resolution benchmark dataset and a novel baseline},
  author={Wang, Xiao and Wang, Shiao and Tang, Chuanming and Zhu, Lin and Jiang, Bo and Tian, Yonghong and Tang, Jin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19248--19257},
  year={2024}
}

Star History