Home

Awesome

State Space Models for Event Cameras (Spotlight)

<p align="center"> <a href="https://www.youtube.com/watch?v=WRZZJn6Me9M"> <img src="https://github.com/uzh-rpg/ssms_event_cameras/blob/master/scripts/zubic_cvpr2024_youtube.png" alt="youtube_video"/> </a> </p>

This is the official PyTorch implementation of the CVPR 2024 paper State Space Models for Event Cameras.

🖼️ Check Out Our Poster! 🖼️ here

:white_check_mark: Updates

Citation

If you find this work and/or code useful, please cite our paper:

@InProceedings{Zubic_2024_CVPR,
  author  = {Zubi\'c, Nikola and Gehrig, Mathias and Scaramuzza, Davide},
  title   = {State Space Models for Event Cameras},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2024},
}

SSM-ViT

Installation

Conda

We highly recommend using Mambaforge to reduce the installation time.

conda create -y -n events_signals python=3.11
conda activate events_signals
conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install lightning wandb pandas plotly opencv-python tabulate pycocotools bbox-visualizer StrEnum hydra-core einops torchdata tqdm numba h5py hdf5plugin lovely-tensors tensorboardX pykeops scikit-learn          

Required Data

To evaluate or train the S5-ViT model, you will need to download the required preprocessed datasets:

<table><tbody> <th valign="bottom"></th> <th valign="bottom">1 Mpx</th> <th valign="bottom">Gen1</th> <tr><td align="left">pre-processed dataset</td> <td align="center"><a href="https://download.ifi.uzh.ch/rpg/RVT/datasets/preprocessed/gen4.tar">download</a></td> <td align="center"><a href="https://download.ifi.uzh.ch/rpg/RVT/datasets/preprocessed/gen1.tar">download</a></td> </tr> <tr><td align="left">crc32</td> <td align="center"><tt>c5ec7c38</tt></td> <td align="center"><tt>5acab6f3</tt></td> </tr> </tbody></table>

You may also pre-process the dataset yourself by following the instructions.

Pre-trained Checkpoints

1 Mpx

<table><tbody> <th valign="bottom"></th> <th valign="bottom">S5-ViT-Base</th> <th valign="bottom">S5-ViT-Small</th> <tr><td align="left">pre-trained checkpoint</td> <td align="center"><a href="https://download.ifi.uzh.ch/rpg/CVPR24_Zubic/gen4_base.ckpt">download</a></td> <td align="center"><a href="https://download.ifi.uzh.ch/rpg/CVPR24_Zubic/gen4_small.ckpt">download</a></td> </tr> </tbody></table>

Gen1

<table><tbody> <th valign="bottom"></th> <th valign="bottom">S5-ViT-Base</th> <th valign="bottom">S5-ViT-Small</th> <tr><td align="left">pre-trained checkpoint</td> <td align="center"><a href="https://download.ifi.uzh.ch/rpg/CVPR24_Zubic/gen1_base.ckpt">download</a></td> <td align="center"><a href="https://download.ifi.uzh.ch/rpg/CVPR24_Zubic/gen1_small.ckpt">download</a></td> </tr> </tbody></table>

Evaluation

1 Mpx

python RVT/validation.py dataset=gen4 dataset.path=${DATA_DIR} checkpoint=${CKPT_PATH} \
use_test_set=1 hardware.gpus=${GPU_ID} +experiment/gen4="${MDL_CFG}.yaml" \
batch_size.eval=12 model.postprocess.confidence_threshold=0.001

Gen1

python RVT/validation.py dataset=gen1 dataset.path=${DATA_DIR} checkpoint=${CKPT_PATH} \
use_test_set=1 hardware.gpus=${GPU_ID} +experiment/gen1="${MDL_CFG}.yaml" \
batch_size.eval=8 model.postprocess.confidence_threshold=0.001

We set the same batch size for the evaluation and training: 12 for the 1 Mpx dataset, and 8 for the Gen1 dataset.

Evaluation results

Evaluation should give the same results as shown below:

<p align="center"> <img src="https://github.com/uzh-rpg/ssms_event_cameras/blob/master/scripts/checkpoints.png"> </p>

Training

1 Mpx

GPU_IDS=[0,1]
BATCH_SIZE_PER_GPU=6
TRAIN_WORKERS_PER_GPU=12
EVAL_WORKERS_PER_GPU=4
python RVT/train.py model=rnndet dataset=gen4 dataset.path=${DATA_DIR} wandb.project_name=ssms_event_cameras \
wandb.group_name=1mpx +experiment/gen4="${MDL_CFG}.yaml" hardware.gpus=${GPU_IDS} \
batch_size.train=${BATCH_SIZE_PER_GPU} batch_size.eval=${BATCH_SIZE_PER_GPU} \
hardware.num_workers.train=${TRAIN_WORKERS_PER_GPU} hardware.num_workers.eval=${EVAL_WORKERS_PER_GPU}

If you for example want to execute the training on 4 GPUs simply adapt GPU_IDS and BATCH_SIZE_PER_GPU accordingly:

GPU_IDS=[0,1,2,3]
BATCH_SIZE_PER_GPU=3

Gen1

GPU_IDS=0
BATCH_SIZE_PER_GPU=8
TRAIN_WORKERS_PER_GPU=24
EVAL_WORKERS_PER_GPU=8
python RVT/train.py model=rnndet dataset=gen1 dataset.path=${DATA_DIR} wandb.project_name=ssms_event_cameras \
wandb.group_name=gen1 +experiment/gen1="${MDL_CFG}.yaml" hardware.gpus=${GPU_IDS} \
batch_size.train=${BATCH_SIZE_PER_GPU} batch_size.eval=${BATCH_SIZE_PER_GPU} \
hardware.num_workers.train=${TRAIN_WORKERS_PER_GPU} hardware.num_workers.eval=${EVAL_WORKERS_PER_GPU}

Code Acknowledgments

This project has used code from the following projects: