Awesome
V2E2V: Video-to-events-to-video framework
This repository is related to the TPAMI paper Sensing Diversity and Sparsity Models for Event Generation and Video Reconstruction from Events and the preliminary work in the ICASSP paper Convolutional ISTA Network with Temporal Consistency Constraints for Video Reconstruction from Event Cameras. This is the Python torch code for Video-to-events-to-video (V2E2V) framework, including E2V reconstruction networks CISTA-LSTC / CISTA-TC.
This code includes two parts: Events-to-video (E2V) reconstruction and video-to-events (V2E) generation. For the E2V reconstruction, a convolutional ISTA network (CISTA) is designed based on sparse representation models and the algorithm unfolding strategy. Long short-term temporal consistency (LSTC) constraints are further introduced to enhance the temporal coherence. In the V2E generation, we introduce the idea of having interleaved pixels with different contrast threshold and lowpass bandwidth and conjecture that this can help extract more useful information from intensity. Finally, V2E2V architecture is used to validate how alternative event generation strategies improve video reconstruction.
News
- The preprint Enhanced Event-Based Video Reconstruction with Motion Compensation is available, which introduces CISTA-Flow (an updated version of the CISTA network) for events-to-video reconstruction. Related codes are released CISTA-EVREAL.
- We have provided codes for generating simulated dataset based on video-to-events generation: V2E_generation.
- Update checkpoints
cista-lstc-fixNE.pth.tar
(trained using fixed number of events per reconstruction) andcista-lstc-randNE.pth.tar
(trained using various number of events per reconstruction).
Requirements
Perceptual loss LPIPS and pytorch_msssim are required.
torch==1.13.1
torchvision==0.14.1
lpips==0.1.2
numpy==1.24.3
opencv_contrib_python==4.5.3.56
opencv_python==4.5.5.62
pytorch_msssim==0.2.1
scikit_image==0.18.3
E2V: CISTA-LSTC Network
The reconstruction networks are under the folder named e2v
. Use train_e2v.py
and test_e2v.py
to train and test the model.
Training
An example of training E2V model.
python train_e2v.py \
--path_to_train_data $path_to_train_data \
--model_name "RecNet" \
--model_mode "cista-lstc" \
--batch_size 1 --epochs 60 --lr 1e-4 \
--len_sequence 15 \
--num_bins 5 \
--depth 5 --base_channels 64 \
--num_events 15000 \
Testing
test_data_mode='real'
for HQF and ECD data sequences, and test_data_mode='upsampled'
for simulated data sequences.
python test_e2v.py \
--path_to_test_model pretrained/RecNet_cista-lstc.pth.tar \
--path_to_test_data data_examples/ECD \
--reader_type 'image_reader' \
--model_mode "cista-lstc" \
--test_data_mode 'real' \
--num_events 15000 \
--test_data_name slider_depth \
V2E: Sensing Diversity in Event Generation
The code for new V2E generation is under the folder of v2e
. This part is modified from v2e. Note that the input must be high frame rate video sequences in order to generate events, otherwise video interpolation is required. Here we use adaptive upsampling based on Super-Slomo. The checkpoint of Super-Slomo model is downloaded from here.
V2E2V framework
Once the E2V network is trained with normal events, we can further train it with the new events generated by V2E. The V2E2V architecture is in model_v2e2v.py
. The v2e2v model takes HFR intensity video sequences as input and output reconstructed frames. Use train.py
and test.py
to train the model.
Training
An example to train the E2V model with the new events. path_to_e2v
is the path to the pretrained E2V model.num_pack_frames
determines how many HFR frames are used to generate events for one reconstruction.
python train.py \
--path_to_train_data $path_to_train_data \
--path_to_e2v pretrained/RecNet_cista-lstc.pth.tar \
--model_name "V2E2V" \
--image_dim 180 240 \
--num_pack_frames 10 \
-s 15 \
--C 0.6 \
--pl 1.5 --ps 0.5 \
--cutoff_hz 200 \
--ql 1 --qs 0 \
--refractory_period_s 0.001 \
--epochs 1 \
Testing
An example to run the V2E2V model to test the reconstruction results based on new events. Specify reader_type
for different input data format (video or images, LFR or HFR), reader_type='video'/'image_reader'/'upsampling'
. For real LFR video sequences, reader_type='upsampling'
while reader_type='image_reader'
for simulated HFR video sequences.
python test.py \
--path_to_test_model pretrained/V2E2V_C0.6_1.5_0.5_fc200.0_1.0_0.0.pth.tar \
--reader_type upsampling \
--path_to_test_data data_examples/HQF \
--num_pack_frames 10 \
--test_data_name still_life \
--test_img_num 300 \
Dataset
We train and test our networks on simulated dataset. We provide codes for generating simulated dataset based on video-to-events generation:
Evaluation can also work on real dataset HQF and ECD data sequences.
Examples of test data can be downloaded here. Consult the README for details.
Citation
If you use any of this code, please cite the publications as follows:
@article{liu_sensing_2023,
title={Sensing Diversity and Sparsity Models for Event Generation and Video Reconstruction from Events},
author={Liu, Siying and Dragotti, Pier Luigi},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
pages={1-16},
publisher={IEEE}.
doi={10.1109/TPAMI.2023.3278940}.
}
@inproceedings{liu_convolutional_2022,
title={Convolutional ISTA Network with Temporal Consistency Constraints for Video Reconstruction from Event Cameras},
author={Liu, Siying and Alexandru, Roxana and Dragotti, Pier Luigi},
booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1935--1939},
year={2022},
organization={IEEE}.
doi={10.1109/ICASSP43922.2022.9746331}.
}