Home

Awesome

Unsupervised Depth Completion from Visual Inertial Odometry

Project VOICED: Depth Completion from Inertial Odometry and Vision

Tensorflow and PyTorch implementations of Unsupervised Depth Completion from Visual Inertial Odometry

Published in RA-L January 2020 and ICRA 2020

[arxiv] [poster] [talk]

Tensorflow models have been tested on Ubuntu 16.04 using Python 3.5, 3.6 Tensorflow 1.14, 1.15 on CUDA 10.0

PyTorch models have been tested on Ubuntu 20.04 using Python 3.7, 3.8 PyTorch 1.10 on CUDA 11.1

Authors: Alex Wong, Xiaohan Fei, Stephanie Tsuei

If you use this work, please cite our paper:

@article{wong2020unsupervised,
 title={Unsupervised Depth Completion From Visual Inertial Odometry},
  author={Wong, Alex and Fei, Xiaohan and Tsuei, Stephanie and Soatto, Stefano},
  journal={IEEE Robotics and Automation Letters},
  volume={5},
  number={2},
  pages={1899--1906},
  year={2020},
  publisher={IEEE}
}

Looking our latest work in unsupervised depth completion?

Check out our RAL 2021 and ICRA 2021 paper, ScaffNet: Learning Topology from Synthetic Data for Unsupervised Depth Completion

ScaffNet is trained on synthetic data (SceneNet), but is able to generalize to novel real data (VOID and NYUv2)!

Also, our ICCV 2021 oral paper, KBNet: Unsupervised Depth Completion with Calibrated Backprojection Layers

KBNet runs at 15 ms/frame (67 fps) and improves over VOICED by 51.7% on indoors (VOID) and 13.7% on outdoors (KITTI)!

Table of Contents

  1. About sparse-to-dense depth completion
  2. About VOICED
  3. Setting up for Tensorflow implementation
  4. Setting up for PyTorch implementation
  5. Related projects
  6. License and disclaimer

About sparse-to-dense depth completion <a name="about-sparse-to-dense"></a>

In the sparse-to-dense depth completion problem, we seek to infer the dense depth map of a 3-D scene using an RGB image and its associated sparse depth measurements in the form of a sparse depth map, obtained either from computational methods such as SfM (Strcuture-from-Motion) or active sensors such as lidar or structured light sensors.

Input RGB image from the VOID datasetDensified depth map -- colored and back-projected to 3-D
<img src="figures/void_teaser.jpg" width="400"><img src="figures/void_teaser.gif">
Input RGB image from the KITTI datasetDensified depth map -- colored and back-projected to 3-D
<img src="figures/kitti_teaser.jpg" width="400"><img src="figures/kitti_teaser.gif">

To follow the literature and benchmarks for this task, you may visit: Awesome State of Depth Completion

About VOICED <a name="about-voiced"></a>

VOICED is an unsupervised depth completion method that is built on top of XIVO. Unlike previous methods, we build a scaffolding of the scene using the sparse depth measurements (~5% density for outdoors driving scenarios like KITTI and ~0.5% to ~0.05% for indoors scenes like VOID) and refines the scaffolding using a light-weight network.

<p align="center"> <img align="center" src="figures/digest_teaser_horizontal.png" width="800"> </p>

This paradigm allows us to achieve the state-of-the-art on the unsupervised depth completion task while reducing parameters by as much as 80% compared to prior-arts. As an added bonus, our approach does not require top of the line GPUs (e.g. Tesla V100, Titan V) and can be deployed on much cheaper hardware.

Setting up for Tensorflow implementation <a name="setting-up-tensorflow"></a>

For the original Tensorflow implementation that was used in Unsupervised Depth Completion from Visual Inertial Odometry, please visit VOICED Tensorflow. Note that the Tensorflow implementation is written for Tensorflow 1 and not 2. We will stop support on Tensorflow 1 version starting on this commit. We currently do not have plans to support Tensorflow 2, but may revisit this in the future if there is enough interest in that platform. For those who are interested in future versions of this work, we encourage you to use the PyTorch version (see below).

Note: Dataset set up and data handling of Tensorflow version follows the original version of the code. To ensure that the code works properly, please treat the tensorflow directory as the root of the Tensorflow code repository.

Setting up for PyTorch implementation <a name="setting-up-pytorch"></a>

We have released a PyTorch re-implementation of Unsupervised Depth Completion from Visual Inertial Odometry. Although hyper-parameters may different, the implementation is faithful to the original -- the necessary change to reproduce the results may be due to subtle differences between Tensorflow and PyTorch platforms. Please see VOICED PyTorch for more source code and instructions. As our group have migrated to PyTorch as the main platform, we will continue to support this re-implementation, but will discontinue support for Tensorflow.

Note: The PyTorch version follows the implementation pattern in KBNet and MonDi and hence dataset (KITTI, VOID) setup and data loading functions will differ from the Tensorflow version. To ensure that the code works properly, please treat the pytorch directory as the root of the PyTorch code repository.

Coming soon! We will release pre-trained models for the PyTorch re-implementation in the upcoming months, stay tuned!

Related projects <a name="related-projects"></a>

You may also find the following projects useful:

We also have works in adversarial attacks on depth estimation methods and medical image segmentation:

License and disclaimer <a name="license-disclaimer"></a>

This software is property of the UC Regents, and is provided free of charge for research purposes only. It comes with no warranties, expressed or implied, according to these terms and conditions. For commercial use, please contact UCLA TDG.