Home

Awesome

<div align="center"> <img src="https://github.com/Event-AHU/EventVOT_Benchmark/blob/main/figures/EventVOT_white.png" width="600">

The First High Definition (HD) Event based Visual Object Tracking Benchmark Dataset


<p align="center"> • <a href="">arXiv</a> • <a href="">Baselines</a> • <a href="">DemoVideo</a> • <a href="">Tutorial</a> • </p> </div>

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline. Xiao Wang, Shiao Wang, Chuanming Tang, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang (2023). arXiv preprint arXiv:2309.14611. [Paper] [Code] [DemoVideo]

:dart: Abstract

Tracking using bio-inspired event cameras draws more and more attention in recent years. Existing works either utilize aligned RGB and event data for accurate tracking or directly learn an event-based tracker. The first category needs more cost for inference and the second one may be easily influenced by noisy events or sparse spatial resolution. In this paper, we propose a novel hierarchical knowledge distillation framework that can fully utilize multi-modal / multi-view information during training to facilitate knowledge transfer, enabling us to achieve high-speed and low-latency visual tracking during testing by using only event signals. Specifically, a teacher Transformer based multi-modal tracking framework is first trained by feeding the RGB frame and event stream simultaneously. Then, we design a new hierarchical knowledge distillation strategy which includes pairwise similarity, feature representation and response maps based knowledge distillation to guide the learning of the student Transformer network. Moreover, since existing event-based tracking datasets are all low-resolution ($346 \times 260$), we propose the first large-scale high-resolution ($1280 \times 720$) dataset named EventVOT. It contains 1141 videos and covers a wide range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc. Extensive experiments on both low-resolution (FE240hz, VisEvent, COESOT), and our newly proposed high-resolution EventVOT dataset fully validated the effectiveness of our proposed method.

:collision: Update Log

:video_camera: Demo Video

A demo video Youtube can be found by clicking the image below:

<p align="center"> <a href="https://youtu.be/FcwH7tkSXK0"> <img src="https://github.com/Event-AHU/EventVOT_Benchmark/blob/main/figures/EventVOT_youtube.png" alt="DemoVideo" width="800"/> </a> </p> <p align="center"> <img src="./figures/EventVOT_samples.jpg" alt="EventVOT_samples" width="800"/> </p> <p align="center"> <img src="./figures/gif.gif" alt="EventVOT_gif" width="800"/> </p>

:hammer: Environment

A distillation framework for Event Stream-based Visual Object Tracking.

[HDETrack_S_ep0050.pth] Passcode:wsad

[Raw Results] Passcode:wsad

<p align="center"> <img width="85%" src="./figures/HDETrack.jpg" alt="Framework"/> </p>

Install env

conda create -n hdetrack python=3.8
conda activate hdetrack
bash install.sh

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Then, put the tracking datasets EventVOT in ./data.

Download pre-trained MAE ViT-Base weights and put it under $/pretrained_models

Download teacher pre-trained CEUTrack_ep0050.pth and put it under $/pretrained_models

Download the trained model weights from [HDETrack_S_ep0050.pth] and put it under $/output/checkpoints/train/hdetrack/hdetrack_eventvot for test directly.

You can also access Weight files in Dropbox to download these weights files.

Train & Test

# train
python tracking/train.py --script hdetrack --config hdetrack_eventvot --save_dir ./output --mode single --nproc_per_node 1 --use_wandb 0

# test
python tracking/test.py hdetrack hdetrack_eventvot --dataset eventvot --threads 1 --num_gpus 1

Test FLOPs, and Speed

Note: The speeds reported in our paper were tested on a single RTX 3090 GPU.

:dvd: EventVOT Dataset

:floppy_disk: Baidu Netdisk: link:https://pan.baidu.com/s/1NLSnczJ8gnHqF-69bE7Ldg?pwd=wsad code:wsad

:floppy_disk: Baidu Netdisk: link:https://pan.baidu.com/s/1ZTX7O5gWlAdpKmd4R9VhYA?pwd=wsad code:wsad

:floppy_disk: Dropbox: https://www.dropbox.com/scl/fo/fv2e3i0ytrjt14ylz81dx/h?rlkey=6c2wk2z7phmbiwqpfhhe29i5p&dl=0

wget -O EventVOT_dataset.zip https://www.dropbox.com/scl/fo/fv2e3i0ytrjt14ylz81dx/h?rlkey=6c2wk2z7phmbiwqpfhhe29i5p"&"dl=1

The directory should have the below format:

├── EventVOT dataset
    ├── Training Subset (841 videos, 180.7GB)
        ├── recording_2022-10-10_17-28-38
            ├── img
            ├── recording_2022-10-10_17-28-38.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ... 
    ├── Testing Subset (282 videos, 64.88GB)
        ├── recording_2022-10-10_17-28-24
            ├── img
            ├── recording_2022-10-10_17-28-24.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ...
    ├── validating Subset (18 videos, 4.34GB)
        ├── recording_2022-10-10_17-31-07
            ├── img
            ├── recording_2022-10-10_17-31-07.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ... 

Normally, we only need the "img" and "..._voxel" files from the EventVOT dataset for training. During testing, we only input "img" for inference. As shown in the following figure,

<p align="center"> <img src="./figures/EventVOT_dataset.png" alt="EventVOT_files" width="600"/> </p>

Note: Our EventVOT dataset is an unimodal Event Dataset, if you need a multimodal RGB-E dataset, please refer to [COESOT], [VisEvent], or [FELT].

:triangular_ruler: Evaluation Toolkit

  1. Download the EventVOT_eval_toolkit from EventVOT_eval_toolki (Passcode:wsad), and open it with Matlab (over Matlab R2020).
  2. add your tracking results and baseline results (Passcode:wsad) in $/eventvot_tracking_results/ and modify the name in $/utils/config_tracker.m
  3. run Evaluate_EventVOT_benchmark_SP_PR_only.m for the overall performance evaluation, including SR, PR, NPR.
  4. run plot_BOC.m for BOC score evaluation and figure plot.
  5. run plot_radar.m for attributes radar figrue plot.
  6. run Evaluate_EventVOT_benchmark_attributes.m for attributes analysis and figure saved in $/res_fig/.
<p align="center"> <img width=50%" src="./figures/BOC.png" alt="Radar"/><img width="50%" src="./figures/attributes.png" alt="Radar"/> </p>

:chart_with_upwards_trend: Benchmark Results

The overall performance evaluation, including SR, PR, NPR.

<p align="left"> <img width="100%" src="./figures/SRPRNPR.png" alt="SRPRNPR"/> </p>

:cupid: Acknowledgement

:newspaper: Citation

@inproceedings{wang2024event,
  title={Event stream-based visual object tracking: A high-resolution benchmark dataset and a novel baseline},
  author={Wang, Xiao and Wang, Shiao and Tang, Chuanming and Zhu, Lin and Jiang, Bo and Tian, Yonghong and Tang, Jin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19248--19257},
  year={2024}
}

Star History

<a href="https://star-history.com/#Event-AHU/EventVOT_Benchmark&Date"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Event-AHU/EventVOT_Benchmark&type=Date&theme=dark" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Event-AHU/EventVOT_Benchmark&type=Date" /> <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Event-AHU/EventVOT_Benchmark&type=Date" /> </picture> </a>