Home

Awesome

Single Image Depth Prediction with Wavelet Decomposition

Michaรซl Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambetov

CVPR 2021

[Link to paper]

<p align="center"> <img src="assets/combo_kitti.gif" alt="kitti gif" width="500" /> <img src="assets/combo_nyu.gif" alt="nyu gif" width="300" /> </p>

We introduce WaveletMonoDepth, which improves efficiency of standard encoder-decoder monocular depth estimation methods by exploiting wavelet decomposition.

<p align="center"> <a href="https://storage.googleapis.com/niantic-lon-static/research/wavelet-monodepth/5min.mp4"> <img src="assets/video_thumbnail.png" alt="5 minute CVPR presentation video link" width="400"> </a> </p>

๐Ÿง‘โ€๐Ÿซ Methodology

WaveletMonoDepth was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon a baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a wavelet prediction.

Wavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of unnecessary computations.

<p align="center"> <img src="assets/architecture.png" alt="our architecture" width="700" /> </p>

The network is first trained with a dense convolutions in the decoder until convergence, and the dense convolutions are then replaced with sparse ones.

This is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.

๐Ÿ—‚ Environment Requirements ๐Ÿ—‚

We recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:

conda env create -f environment.yml
conda activate wavelet-mdp

Our work uses Pytorch Wavelets, a great package from Fergal Cotter which implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! To install Pytorch Wavelets, simply run:

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

๐Ÿš—๐Ÿšฆ KITTI ๐ŸŒณ๐Ÿ›ฃ

Depth Hints was used as a baseline for KITTI.

Depth Hints builds upon monodepth2. If you have questions about running the code, please see the issues in their repositories first.

โš™ Setup, Training and Evaluation

Please see the KITTI directory of this repository for details on how to train and evaluate our method.

๐Ÿ“Š Results ๐Ÿ“ฆ Trained models

Please find below the scores using dense convolutions to predict wavelet coefficients. Download links coming soon!

Model nameTraining modalityResolutionabs_relRMSEฮด<1.25
Ours Resnet18Stereo + DepthHints640 x 1920.1064.6930.876
Ours Resnet50Stereo + DepthHints640 x 1920.1054.6250.879
Ours Resnet18Stereo + DepthHints1024 x 3200.1024.4520.890
Ours Resnet50Stereo + DepthHints1024 x 3200.0974.3870.891

๐ŸŽš Playing with sparsity

However the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients to trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:

<p align="center"> <img src="assets/kitti_sparsify.gif" alt="sparsify kitti" width="500" /> </p>

Computing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.

<p align="center"> <img src="assets/relative_score_loss_kitti.png" alt="scores kitti" width="500" /> </p>

Our wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in performance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.

<p align="center"> <img src="assets/score_vs_flops.png" alt="scores vs flops kitti" width="500" /> </p>

๐Ÿช‘๐Ÿ› NYUv2 ๐Ÿ›‹๐Ÿšช

Dense Depth was used as a baseline for NYUv2. Note that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we made a few different modifications:

โš™ Setup, Training and Evaluation

Please see the NYUv2 directory of this repository for details on how to train and evaluate our method.

๐Ÿ“Š Results and ๐Ÿ“ฆ Trained models

Please find below the scores and associated trained models, using dense convolutions to predict wavelet coefficients.

Model nameEncoderResolutionabs_relRMSEฮด<1.25ฮต_acc
BaselineDenseNet640 x 4800.12770.54790.84301.7170
OursDenseNet640 x 4800.12580.55150.84511.8070
BaselineMobileNetv2640 x 4800.17720.66380.74191.8911
OursMobileNetv2640 x 4800.17270.67760.73801.9732

๐ŸŽš Playing with sparsity

As with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on performance.

<p align="center"> <img src="assets/nyu_sparsify.gif" alt="sparsify nyu" width="500" /> </p>

Computing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than 0.15%.

<p align="center"> <img src="assets/relative_score_loss_nyu.png" alt="scores nyu" width="500" /> </p>

๐ŸŽฎ Try it yourself!

Try using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the resulting computational saving in FLOPs. Notebooks can be found in <DATASET>/sparsity_test_notebook.ipynb where <DATASET> is either KITTI or NYUv2.

โœ๏ธ ๐Ÿ“„ Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{ramamonjisoa-2021-wavelet-monodepth,
  title     = {Single Image Depth Prediction with Wavelet Decomposition},
  author    = {Ramamonjisoa, Micha{\"{e}}l and
               Michael Firman and
               Jamie Watson and
               Vincent Lepetit and
               Daniyar Turmukhambetov},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  month = {June},
  year = {2021}
}

๐Ÿ‘ฉโ€โš–๏ธ License

Copyright ยฉ Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.