Awesome
ScanDMM: A Deep Markov Model of Scanpath Prediction for 360° Images [CVPR2023]
:sparkles:Paper :sparkles:Poster :sparkles:Presentation (YouTube) :sparkles:Slide :sparkles:OpenReview
Implementation version
Pytorch 1.8.1 & CUDA 10.1.
Please referring to requirements.txt for details.
If your CUDA version is 10.1, you can directly execute the following command to install the environment:
conda create -n scandmm python==3.7
conda activate scandmm
pip install -r requirements.txt
Training
- To reproduce the training and validation dataset, please referring to data_process.py. Alternatively, using the ready-to-use data.
- Execute:
python train.py --seed=1234 --dataset='./Datasets/Sitzmann.pkl' --lr=0.0003 --bs=64 --epochs=500 --save_root='./model/'
- Check the training log and checkpoints in Log (created automatically) and ./model files, respectively.
Test
- Prepare the test images and put them in a folder (e.g, ./demo/input)
- Create a folder to store the results (e.g, ./demo/output)
- A pre-trained weights (e.g, './model/model_lr-0.0003_bs-64_epoch-435.pkl')
- Execute:
python inference.py --model='./model/model_lr-0.0003_bs-64_epoch-435.pkl' --inDir='./demo/input' --outDir='./demo/output' --n_scanpaths=200 --length=20 --if_plot=True
- Modify n_scanpaths and length to change the number and length of the produced scanpaths. Please referring to inference.py for more details about the produced scanpaths.
- Check the results:
sp_P48_5376x2688.png
scanpaths = np.load(P48_5376x2688.npy)
print(scanpaths.shape)
(200, 20, 2)
# (n_scanpaths, length, (y, x)). (y, x) are normalized coordinates in the range [0, 1] (y/x = 0 indicate the top/left edge).
sp_P8_7500x3750.png
scanpaths = np.load(P8_7500x3750.npy)
print(scanpaths.shape)
(200, 20, 2)
Bibtex
@InProceedings{scandmm2023,
title={ScanDMM: A Deep Markov Model of Scanpath Prediction for 360° Images},
author={Xiangjie Sui and Yuming Fang and Hanwei Zhu and Shiqi Wang and Zhou Wang},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition},
year={2023}
}
Acknowledgment
The author would like to thank Kede Ma for his inspiration, Daniel Martin et al. for publishing ScanGAN model and visualization functions, and Bingham Eli et al. for the implementation of Pyro. We sincerely appreciate for their contributions.