Awesome

Learning Depth from Focus in the Wild

[ECCV2022] Official pytorch implementation of "Learning Depth from Focus in the Wild"

Paper
Contact email : cywon1997@gm.gist.ac.kr

Requirements

Python == 3.7.7
imageio==2.16.1
h5py==3.6.0
numpy==1.21.5
matplotlib==3.5.1
opencv-python==4.5.5.62
OpenEXR==1.3.7
Pillow==7.2.0
scikit-image==0.19.2
scipy==1.7.3
torch==1.6.0
torchvision==0.7.0
tensorboard==2.8.0
tqdm==4.46.0
mat73==0.58
typing-extensions==4.1.1

Depth estimation Network

Our depth estimation network is implemented based on the codes released by PSMNet [1] and CFNet [2].

1. Download Datasets

DDFF-12-Scene Dataset [5] : Download offical link (https://vision.in.tum.de/webarchive/hazirbas/ddff12scene/ddff-dataset-test.h5)
DefocusNet Dataset [6], 4D Light Field Dataset [9], Middlebury Dataset [8] : Follow the procedure written on the official github of AiFDepthNet [7] (https://github.com/albert100121/AiFDepthNet)
Smartphone dataset [10] : Download dataet from the offical website of Learning to Autofocus (https://learntoautofocus-google.github.io/)
Put the datasets in folders in 'Depth_Estimation_Test/Datasets/'

2. Use pretrained model

Or train the model by using train codes.

Put the datasets in folders in 'train_codes/Datasets/' .
Run train codes in 'train_codes'

python train_code_[Dataset].py --lr [learning rate]

3. Run test.py

python test.py --dataset [Dataset]

Simulator

1. Download NYU v2 dataset [4]

Official_link : http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
Put 'nyu_depth_v2_labeled.mat' in 'Simulator' folder.
This simulator is implemented based on the codes released by [3].

2. Run the code.

python synthetic_blur_movement.py

End-To-End Network

1. Upload your dataset.

Put your focal stacks and meta-data in 'End_to_End/Datasets/folder'.
The name of metadata should be 'focus_distance.txt' and 'focal_length.txt' for focus distances and focal length of lens, respectively.
The pretrained weight is trained by the synthetic dataset whose focal stacks consist of 10 images generated by our simulator.
Please refer the uploaded dataset( folder 'ball')

2. Run test_real_scenes.py in 'End_to_End'

python test_real_scenes.py

Results

Before alignment network (input focal stack)

After alignment network (aligned focal stack)

After depth estimation network (depth map)

Limitation

Our network only handles 3 basic motions, which can't cover all motions, so performance is poor when dealing with complex motions.
There are rooms between the synthetic dataset generated by the simulator and real datasets. Please note that handling the parameters of our simulator and tuning the training epochs will yield better results than ours.

Sources

[1] Chang, Jia-Ren, and Yong-Sheng Chen. "Pyramid stereo matching network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. code paper

[2] Shen, Zhelun, Yuchao Dai, and Zhibo Rao. "Cfnet: Cascade and fused cost volume for robust stereo matching." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. code paper

[3] Abuolaim, Abdullah, et al. "Learning to reduce defocus blur by realistically modeling dual-pixel data." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. code paper

[4]Silberman, Nathan, et al. "Indoor segmentation and support inference from rgbd images." European conference on computer vision. Springer, Berlin, Heidelberg, 2012. page

[5]Hazirbas, Caner, et al. "Deep depth from focus." Asian Conference on Computer Vision. Springer, Cham, 2018. code paper

[6]Maximov, Maxim, Kevin Galim, and Laura Leal-Taixé. "Focus on defocus: bridging the synthetic to real domain gap for depth estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. code paper

[7]Wang, Ning-Hsu, et al. "Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. code paper

[8] Scharstein, Daniel, et al. "High-resolution stereo datasets with subpixel-accurate ground truth." German conference on pattern recognition. Springer, Cham, 2014. page

[9] Honauer, Katrin, et al. "A dataset and evaluation methodology for depth estimation on 4D light fields." Asian Conference on Computer Vision. Springer, Cham, 2016. page

[10] Herrmann, Charles, et al. "Learning to autofocus." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. page