Home

Awesome

Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV


This repository contains the code associated with the following publications:

Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV

Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden

ArXiv (ArXiv 2024)

Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV

Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden

ArXiv (ICCV 2023)

Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter

Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden

ArXiv (TMLR 2022)

We have organized several monocular depth prediction challenges around the proposed SYNS-Patches dataset. Check the MDEC website for details on previous editions!

<p align="center"> <img src="./assets/syns/image_0026.png" alt="image_0026" width="32%"/> <img src="./assets/syns/image_0254.png" alt="image_0254" width="32%"/> <img src="./assets/syns/image_0698.png" alt="image_0698" width="32%"/> <img src="./assets/syns/depth_0026.png" alt="depth_0026" width="32%"/> <img src="./assets/syns/depth_0254.png" alt="depth_0254" width="32%"/> <img src="./assets/syns/depth_0698.png" alt="depth_0698" width="32%"/> </p> <p align="center"> <img src="./assets/slowtv/00_natural.png" alt="image_0026" width="32%"/> <img src="./assets/slowtv/00_driving.png" alt="image_0254" width="32%"/> <img src="./assets/slowtv/00_underwater.png" alt="image_0698" width="32%"/> <img src="./assets/slowtv/03_natural.png" alt="depth_0026" width="32%"/> <img src="./assets/slowtv/03_driving.png" alt="depth_0254" width="32%"/> <img src="./assets/slowtv/03_underwater.png" alt="depth_0698" width="32%"/> </p>

Project Structure

* Not tracked by Git!


Pretrained Checkpoints

You can download the pretrained full models from the following DropBox link:

We also provide a minium-requirements script to load a pretrained model and compute predictions on a directory of images. This is probably what you want if you just want to try out the model, as opposed to training it yourself. Code illustrating how to align the predictions to a ground-truth depth map can be found here.

The only requirements for running the model are: timm, torch and numpy.


MapFreeReloc

You can download the val/test MapFreeReloc predictions for our public models from:

These can be used in your own MapFreeReloc submission to replace the baseline DPT+KITTI. Please remember to cite us if doing so!


Getting Started

Each section of the code has its own README file with more detailed instructions. Follow them only after having carried out the remaining steps in this section.

PYTHONPATH

Remember to add the path to the repo to the PYTHONPATH in order to run the code.

# Example for `bash`. Can be added to `~/.bashrc`.
export PYTHONPATH=/path/to/slowtv_monodepth:$PYTHONPATH

Git Hooks

First, set up a GitHub pre-commit hook that stops us from committing Jupyter Notebooks with outputs, since they may potentially contain large images.

./.git-hooks/setup.sh
chmod +x .git/hooks/pre-commit  # File sometimes isn't copied as executable. This should fix it. 

Anaconda

If using Miniconda, create the environment and run commands as

ENV_NAME=slowtv
conda env create --file docker/environment.yml
conda activate $ENV_NAME
python api/train/train.py ...

Docker

To instead build the Docker image, run

docker build -t $ENV_NAME ./docker
docker run -it \
    --shm-size=24gb \
    --gpus all \
    -v $(pwd -P):$(pwd -P) \
    -v /path/to/dataroot1:/path/to/dataroot1 \
    --user $(id -u):$(id -g) \
    $ENV_NAME:latest \
    /bin/bash

python api/train/train.py ...

Paths

The default locations for datasets and model checkpoints are ./data & ./models, respectively. If you want to store them somewhere else, you can either create symlinks to them, or add additional roots. This is done by creating the ./PATHS.yaml file with the following contents:

# -----------------------------------------------------------------------------
MODEL_ROOTS: 
  - /path/to/modelroot1

DATA_ROOTS:
  - /path/to/dataroot1
  - /path/to/dataroot2
  - /path/to/dataroot3
# -----------------------------------------------------------------------------

NOTE: This file should not be tracked by Git, as it may contain sensitve information about your machine.

Multiple roots may be useful if training in an HPC cluster where data has to be copied locally. Roots should be listed in order of preference, i.e. dataroot1/kitti_raw_syns will be given preference over dataroot2/kitti_raw_syns.

Results

We provide the YAML files containing the precomputed results used in the paper. These should be copied over to the ./models directory (or any desired root) in order to follow the structure required by the evaluation and table-generating scripts.

cp -r ./results/* ./models

Citation

If you used the code in this repository or found the papers interesting, please cite them as

@inproceedings{spencer2024cribstv,
title={Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
booktitle={ArXiv Preprint},
year={2024}
}
@inproceedings{spencer2023slowtv,
title={Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023}
}
@article{spencer2022deconstructing,
title={Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2022},
url={https://openreview.net/forum?id=GFK1FheE7F},
note={Reproducibility Certification}
}

References

We would also like to thank the authors of the papers below for their contributions and for releasing their code. Please consider citing them in your own work.

TagTitleAuthorConfArXivGitHub
GargUnsupervised CNN for Single View Depth Estimation: Geometry to the RescueGarg et. alECCV 2016ArXivGitHub
MonodepthUnsupervised Monocular Depth Estimation with Left-Right ConsistencyGodard et. alCVPR 2017ArXivGitHub
KuznietsovSemi-Supervised Deep Learning for Monocular Depth Map PredictionKuznietsov et. alCVPR 2017ArXivGitHub
SfM-LearnerUnsupervised Learning of Depth and Ego-Motion from VideoZhou et. alCVPR 2017ArXivGitHub
Depth-VO-FeatUnsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature ReconstructionZhan et. alCVPR 2018ArXivGitHub
DVSODeep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse OdometryYang et. alECCV 2018ArXiv
KlodtSupervising the new with the old: learning SFM from SFMKlodt & VedaldiECCV 2018CVF
MonoResMatchLearning monocular depth estimation infusing traditional stereo knowledgeTosi et. alCVPR 2019ArXivGitHub
DepthHintsSelf-Supervised Monocular Depth HintsWatson et. alICCV 2019ArXivGitHub
Monodepth2Digging Into Self-Supervised Monocular Depth EstimationGodard et. alICCV 2019ArXivGitHub
SuperDepthSuperDepth: Self-Supervised, Super-Resolved Monocular Depth EstimationPillai et. alICRA 2019ArXivGitHub
JohnstonSelf-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity VolumeJohnston & CarneiroCVPR 2020ArXiv
FeatDepthFeature-metric Loss for Self-supervised Learning of Depth and EgomotionShu et. alECCV 2020ArXivGitHub
CADepthChannel-Wise Attention-Based Network for Self-Supervised Monocular Depth EstimationYan et. al3DV 2021ArXivGitHub
DiffNetSelf-Supervised Monocular Depth Estimation with Internal Feature FusionZhou et. alBMVC 2021ArXivGitHub
HR-DepthHR-Depth: High Resolution Self-Supervised Monocular Depth EstimationLyu et. alAAAI 2021ArXivGitHub
MiDaSTowards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset TransferRanftl el. alPAMI 2020ArXivGitHub
DPTVision Transformers for Dense PredictionRanftl el. alICCV 2021ArXivGitHub
NeWCRFsNeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth EstimationWeihao el. alCVPR 2022ArXivGitHub

Licence

This project is licenced under the Commons Clause and GNU GPL licenses. For commercial use, please contact the authors.