Awesome
STFormer for video SCI
This repo is the implementation of "Spatial-Temporal Transformer for Video Snapshot Compressive Imaging".
Abstract
Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this paper, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer.
Testing Result on Simulation Dataset
<div align="center"> <img src="docs/gif/Bosphorus.gif" /> <img src="docs/gif/ShakeNDry.gif" />Fig1. Reconstructed Color Data via Different Algorithms
</div>Installation
Please see the Installation Manual for STFormer Installation.
Training
Support multi GPUs and single GPU training efficiently. First download DAVIS 2017 dataset from DAVIS website, then modify data_root value in configs/_base_/davis.py file, make sure data_root link to your training dataset path.
Launch multi GPU training by the statement below:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=3278 tools/train.py configs/STFormer/stformer_base.py --distributed=True
Launch single GPU training by the statement below.
Default using GPU 0. One can also choosing GPUs by specify CUDA_VISIBLE_DEVICES
python tools/train.py configs/STFormer/stformer_base.py
Testing STFormer on Grayscale Simulation Dataset
Specify the path of weight parameters, then launch 6 benchmark test in grayscale simulation dataset by executing the statement below.
python tools/test.py configs/STFormer/stformer_base.py --weights=checkpoints/stformer_base.pth
Testing STFormer in Color Simulation Dataset
First, download the model weight file (checkpoints/stformer/stformer_base_mid_color.pth) and test data (datasets/middle_scale) from Dropbox or BaiduNetdisk, and place them in the checkpoints folder and test_datasets folder respectively. Then, execute the statement below to launch STFormer in 6 middle color simulation dataset.
python tools/test.py configs/STFormer/stformer_base_mid_color.py --weights=checkpoints/stformer_base_mid_color.pth
Testing STFormer on Real Dataset
Download model weight file (checkpoints/stformer/stformer_base_real_cr10.pth) from Dropbox or BaiduNetdisk. Launch STFormer on real dataset by executing the statement below.
python tools/test_real_data.py configs/STFormer/stformer_base_real_cr10.py --weights=checkpoints/stformer_base_real_cr10.pth
Notice:
Results only show real data when its compress ratio (cr) equals to 10, for other compress ratio, we only need to change the cr value in file in stformer_real_cr10.py and retrain the model.
Citation
@article{wang2023spatial,
author={Wang, Lishun and Cao, Miao and Zhong, Yong and Yuan, Xin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Spatial-Temporal Transformer for Video Snapshot Compressive Imaging},
year={2023},
volume={45},
number={7},
pages={9072-9089},
doi={10.1109/TPAMI.2022.3225382}}
Acknowledgement
The codes are based on CACTI, we also refer to codes in Swin Transformer, Video Swin Transformer, RevSCI and Two Stage. Thanks for their awesome works.