Home

Awesome

Robust Learning Through Cross-Task Consistency <br>

<table> <tr><td><em>Above: A comparison of the results from consistency-based learning and learning each task individually. The yellow markers highlight the improvement in fine grained details.</em></td></tr> </table> <br> This repository contains tools for training and evaluating models using consistency:

for the following paper:

<!-- <br><a href=https://consistency.epfl.ch>Robust Learing Through Cross-Task Consistency</a> (CVPR 2020, Oral).<br> --> <!-- Amir Zamir, Alexander Sax, Teresa Yeo, Oğuzhan Kar, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas Guibas --> <div style="text-align:center"> <h4><a href=https://consistency.epfl.ch>Robust Learing Through Cross-Task Consistency</a> (CVPR 2020, Best Paper Award Nomination, Oral)</h4> </div> <br>

Cross-Task Consistency Results

For further details, a live demo, video visualizations, and an overview talk, refer to our project website.

PROJECT WEBSITE:

<div style="text-align:center">
LIVE DEMOVIDEO VISUALIZATION
Upload your own images and see the results of different consistency-based models vs. various baselines.<br><br><img src=./assets/screenshot-demo.png width="400">Visualize models with and without consistency, evaluated on a (non-cherry picked) YouTube video.<br><br><br><img src=./assets/output_video.gif width="400">
</div>

Table of Contents

<br>

Introduction

Visual perception entails solving a wide set of tasks (e.g. object detection, depth estimation, etc). The predictions made for each task out of a particular observation are not independent, and therefore, are expected to be consistent.

What is consistency? Suppose an object detector detects a ball in a particular region of an image, while a depth estimator returns a flat surface for the same region. This presents an issue -- at least one of them has to be wrong, because they are inconsistent.

Why is it important?

  1. Desired learning tasks are usually predictions of different aspects of a single underlying reality (the scene that underlies an image). Inconsistency among predictions implies contradiction.
  2. Consistency constraints are informative and can be used to better fit the data or lower the sample complexity. They may also reduce the tendency of neural networks to learn "surface statistics" (superficial cues) by enforcing constraints rooted in different physical or geometric rules. This is empirically supported by the improved generalization of models when trained with consistency constraints.

How do we enforce it? The underlying concept is that of path independence in a network of tasks. Given an endpoint Y2, the path from X->Y1->Y2 should give the same results as X->Y2. This can be generalized to a larger system, with paths of arbitrary lengths. In this case, the nodes of the graph are our prediction domains (eg. depth, normal) and the edges are neural networks mapping these domains.

This repository includes training code for enforcing cross task consistency, demo code for visualizing the results of a consistency trained model on a given image and links to download these models. For further details, refer to our paper or website.

Consistency Domains

Consistency constraints can be used for virtually any set of domains. This repository considers transferring between image domains, and our networks were trained for transferring between the following domains from the Taskonomy dataset.

Curvature         Edge-3D            Reshading
Depth-ZBuffer     Keypoint-2D        RGB       
Edge-2D           Keypoint-3D        Surface-Normal 

The repo contains consistency-trained models for RGB -> Surface-Normals, RGB -> Depth-ZBuffer, and RGB -> Reshading. In each case the remaining 7 domains are used as consistency constraints in during training.

Descriptions for each domain can be found in the supplementary file of Taskonomy.

Network Architecture

All networks are based on the UNet architecture. They take in an input size of 256x256, upsampling is done via bilinear interpolations instead of deconvolutions and trained with the L1 loss. See the table below for more information.

Task NameOutput DimensionDownsample Blocks
RGB -> Depth-ZBuffer256x256x16
RGB -> Reshading256x256x15
RGB -> Surface-Normal256x256x36

Other networks (e.g. Curvature -> Surface-Normal) use a UNet, their architecture hyperparameters are detailed in transfers.py.

More information on the models, including download links, can be found here and in the supplementary material.

<br> <br>

Installation

There are two convenient ways to run the code. Either using Docker (recommended) or using a Python-specific tool such as pip, conda, or virtualenv.

Installation via Docker [Recommended]

We provide a docker that contains the code and all the necessary libraries. It's simple to install and run.

  1. Simply run:
<!-- docker pull epflvilab/xtconsistency:latest -->
docker run --runtime=nvidia -ti --rm epflvilab/xtconsistency:latest

The code is now available in the docker under your home directory (/app), and all the necessary libraries should already be installed in the docker.

Installation via Pip/Conda/Virtualenv

The code can also be run using a Python environment manager such as Conda. See requirements.txt for complete list of packages. We recommend doing a clean installation of requirements using virtualenv:

  1. Clone the repo:
git clone git@github.com:EPFL-VILAB/XTConsistency.git
cd XTConsistency
  1. Create a new environment and install the libraries:
conda create -n testenv -y python=3.6
source activate testenv
pip install -r requirements.txt
<br> <br>

Quickstart (Run Demo Locally)

Download the consistency trained networks

If you haven't yet, then download the pretrained models. Models used for the demo can be downloaded with the following command:

sh ./tools/download_models.sh

This downloads the baseline, consistency trained models for depth, normal and reshading target (1.3GB) to a folder called ./models/. Individial models can be downloaded here.

Run a model on your own image

To run the trained model of a task on a specific image:

python demo.py --task $TASK --img_path $PATH_TO_IMAGE_OR_FOLDER --output_path $PATH_TO_SAVE_OUTPUT

The --task flag specifies the target task for the input image, which should be either normal, depth or reshading.

To run the script for a normal target on the example image:

python demo.py --task normal --img_path assets/test.png --output_path assets/

It returns the output prediction from the baseline (test_normal_baseline.png) and consistency models (test_normal_consistency.png).

Test imageBaselineConsistency

Similarly, running for target tasks reshading and depth gives the following.

Baseline (reshading)Consistency (reshading)Baseline (depth)Consistency (depth)
<br> <br>

Energy Computation

Training with consistency involves several paths that each predict the target domain, but using different cues to do so. The disagreement between these predictions yields an unsupervised quantity, consistency energy, that our CVPR 2020 paper found correlates with prediciton error. You can view the pixel-wise consistency energy (example below) using our live demo.

Sample ImageNormal PredictionConsistency Energy
<img src=./assets/energy_query.png width="600"><img src=./assets/energy_normal_prediction.png width="600"><img src=./assets/energy_prediction.png width="600">
Sample image from the Stanford 2D3DS dataset.Some chair legs are missing in the RGB -> Normal prediction.The white pixels indicate higher uncertainty about areas with missing chair legs.

To compute energy locally, over many images, and/or to plot energy vs error, you can use the following energy_calc.py script. For example, to reproduce the following scatterplot using energy_calc.py:

Energy vs. Error
Result from running the command below.

First download a subset of images from the Taskonomy buildings almena and albertville (512 images per domain, 388MB):

sh ./tools/download_data.sh

Second, download all the networks necessary to compute the consistency energy. The following script will download them for you (skipping previously downloaded models) (0.8GB - 4.0GB):

sh ./tools/download_energy_graph_edges.sh

Now we are ready to compute energy. The following command generates a scatter plot of consistency energy vs. prediction error:

python -m scripts.energy_calc energy_calc --batch_size 2 --subset_size=128 --save_dir=results

By default, it computes the energy and error of the subset_size number of points on the Taskonomy buildings almena and albertville. The error is computed for the normal target. The resulting plot is saved to energy.pdf in RESULTS_DIR and the corresponding data to data.csv.

Compute energy on arbitrary images

Consistency energy is an unsupervised quantity and as such, no ground-truth labels are necessary. To compute the energy for all query images in a directory, run:

python -m scripts.energy_calc energy_calc_nogt 
    --data-dir=PATH_TO_QUERY_IMAGE --batch_size 1 --save_dir=RESULTS_DIR \
    --subset_size=NUMBER_OF_IMAGES --cont=PATH_TO_TRAINED_MODEL

It will append a dashed horizontal line to the plot above where the energy of the query image(s) are. This plot is saved to energy.pdf in RESULTS_DIR.

<br> <br>

Pretrained Models

We are providing all of our pretrained models for download. These models are the same ones used in the live demo and video evaluations.

Network Architecture

All networks are based on the UNet architecture. They take in an input size of 256x256, upsampling is done via bilinear interpolations instead of deconvolutions. All models were trained with the L1 loss.

Download consistency-trained models

Instructions for downloading the trained consistency models can be found here

sh ./tools/download_models.sh

This downloads the baseline, consistency trained models for depth, normal and reshading target (1.3GB) to a folder called ./models/. See the table below for specifics:

Task NameOutput DimensionDownsample Blocks
RGB -> Depth-ZBuffer256x256x16
RGB -> Reshading256x256x15
RGB -> Surface-Normal256x256x36

Individual consistency models can be downloaded here.

Download perceptual networks

The pretrained perceptual models can be downloaded with the following command.

sh ./tools/download_percep_models.sh

This downloads the perceptual models for the depth, normal and reshading target (1.6GB). Each target has 7 pretrained models (from the other sources below).

Curvature         Edge-3D            Reshading
Depth-ZBuffer     Keypoint-2D        RGB       
Edge-2D           Keypoint-3D        Surface-Normal 

Perceptual model architectural hyperparameters are detailed in transfers.py, and some of the pretrained models were trained using L2 loss. For using these models with the provided training code, the pretrained models should be placed in the file path defined by MODELS_DIR in utils.py.

Individual perceptual models can be downloaded here.

Download baselines

We also provide the models for other baselines used in the paper. Many of these baselines appear in the live demo. The pretrained baselines can be downloaded here. Note that we will not be providing support for them.

*Models for other tasks are available using the visualpriors package or in Tensorflow via the Taskonomy GitHub page.

<br> <br>

Training

We used the provided training code to train our consistency models on the Taskonomy dataset. We used 3 V100 (32GB) GPUs to train our models, running them for 500 epochs takes about a week.

Runnable Example: You'll find that the code in the rest of this section expects about 12TB of data (9 single-image tasks from Taskonomy). For a quick runnable example that gives the gist, try the following:

First download the data and then start a visdom (logging) server:

sh ./tools/download_data.sh # Starter data (388MB)
visdom &                    # To view the telemetry

Then, start the training using the following command, which cascades two models (trains a normal model using curvature consistenct on a training set of 512 images).

python -m train example_cascade_two_networks --k 1 --fast

You can add more pereceptual losses by changing the config in energy.py. For example, train the above model using both curvature and 2D edge consistency:

python -m train example_normal --k 2 --fast

Assuming that you want to train on the full dataset or [on your own dataset], read on.

The code is structured as follows

config/             # Configuration parameters: where to save results, etc.
    split.txt           # Train, val split
    jobinfo.txt         # Defines job name, base_dir
modules/            # Network definitions
train.py            # Training script
dataset.py          # Creates dataloader
energy.py           # Defines path config, computes total loss, logging
models.py           # Implements forward backward pass
graph.py            # Computes path defined in energy.py
task_configs.py     # Defines task specific preprocessing, masks, loss fn
transfers.py        # Loads models
utils.py            # Defines file paths (described below) 
demo.py             # Demo script

Expected folder structure

The code expects folders structured as follows. These can be modified by changing values in utils.py

base_dir/                   # The following paths are defined in utils.py (BASE_DIR)
    shared/                 # with the corresponding variable names in brackets
        models/             # Pretrained models (MODELS_DIR)
        results_[jobname]/  # Checkpoint of model being trained (RESULTS_DIR)
        ood_standard_set/   # OOD data for visualization (OOD_DIR)
    data_dir/               # taskonomy data (DATA_DIRS)

Training with consistency

  1. Define locations for data, models, etc.: Create a jobinfo.txt file and define the name of the job and the absolute path to BASE_DIR where data, models results would be stored, as shown in the folder structure above. An example config is provided in the starter code (configs/jobinfo.txt). To modify individual file paths eg. the models folder, change MODELS_DIR variable name in utils.py.

    We won't cover downloading the Taskonomy dataset, which can be downloaded following the instructions here

  2. Download perceptual networks: If you want to initialize from our pretrained models, then then download them with the following command (1.6GB):

    sh ./tools/download_percep_models.sh
    

    More info about the networks is available here.

  3. Train with consistency using the command:

    python -m train multiperceptual_{depth,normal,reshading}
    

    For example, to run the training code for the normal target, run

    python -m train multiperceptual_normal
    

    This trains the model for the normal target with 8 perceptual losses ie. curvature, edge2d, edge3d, keypoint2d, keypoint3d, reshading, depth and imagenet. We used 3 V100 (32GB) GPUs to train our models, running them for 500 epochs takes about a week.

    Additional arugments can be specified during training, the most commonly used ones are listed below. For the full list, refer to the training script.

    • The flag --k defines the number of perceptual losses used, thus reducing GPU memory requirements.
    • There are several options for choosing how this subset is chosen 1. randomly (--random-select) 2. winrate (--winrate)
    • Data augmentation is not done by default, it can be added to the training data with the flag --dataaug. The transformations applied are 1. random crop with probability 0.5 2. color jitter with probability 0.5.

    To train a normal target domain with 2 perceptual losses selected randomly each epoch, run the following command.

    python -m train multiperceptual_normal --k 2 --random-select
    
  4. Logging: The losses and visualizations are logged in Visdom. This can be accessed via [server name]/env/[job name] eg. localhost:8888/env/normaltarget_allperceps.

    An example visualization is shown below. We plot the the outputs from the paths defined in the energy configuration used. Two windows are shown, one shows the predictions before training starts, the other updates them after each epoch. The labels for each column can be found at the top of the window. The second column has the target's ground truth y^, the third its prediction n(x) from the RGB image x. Thereafter, the predictions of each pair of images with the same domain are given by the paths f(y^),f(n(x)), where f is from the target domain to another domain eg. curvature.

    Logging conventions: For uninteresting historical reasons, the columns in the logging during training might have strange names. You can define your own names instead of using these by changing the config file in energy.py.

    Here's a quick guide to the current convention. For example, when training with a normal model using consistency:

    • The RGB input is denoted as x and the target domain is denoted as y. The ground truth label for a domain is marked with a ^(e.g. y^ for the for target domain).
    • The direct (RGB -> Z) and perceptual (target [Y] -> Z) transfer functions are named as follows:<br>(i.e. the function for rgb to curvature is RC; for normal to curvature it's f)
    Domain (Z)rgb -> Z<br>(Direct)Y -> Z<br>(Perceptual)Domain (Z)rgb -> Z<br>(Direct)Y -> Z<br>(Perceptual)
    targetn-keypoints2dk2Nk2
    curvatureRCfkeypoints3dk3Nk3
    sobel edgesasedge occlusionE0nE0

To train on other target domains

  1. A new configuration should be defined in the energy_configs dictionary in energy.py.

    Decription of the infomation needed:

    • paths: X1->X2->X3. The keys in this dictionary uses a function notation eg. f(n(x)), with its corresponding value being a list of task objects that defines the domains being transfered eg. [rgb, normal, curvature]. The rgb input is defined as x, n(x) returns normal predictions from rgb, and f(n(x)) returns curvature from normal. These notations do not need to be same for all configurations. The table below lists those that have been kept constant for all targets.
    • freeze_list: the models that will not be optimized,
    • losses: loss terms to be constructed from the paths defined above,
    • plots: the paths to plots in the visdom environment.
  2. New models may need to be defined in the pretrained_transfers dictionary in transfers.py. For example, for a curvature target, and perceptual model curvature to normal, the code will look for the principal_curvature2normal.pth file in MODELS_DIR if it is not defined in transfers.py.

To train on other datasets

The expected folder structure for the data is,

DATA_DIRS/
  [building]_[domain]/
      [domain]/
          [view]_domain_[domain].png
          ...

Pytorch's dataloader __getitem__ method has been overwritten to return a tuple of all tasks for a given building and view point. This is done in datasets.py. Thus, for other folder structures, a function to get the corresponding file paths for different domains should be defined.

For task specific configs, like transformations and masks, are defined in task_configs.py.

<br> <br>

Citation

If you find the code, models, or data useful, please cite this paper:

@article{zamir2020consistency,
  title={Robust Learning Through Cross-Task Consistency},
  author={Zamir, Amir and Sax, Alexander and Yeo, Teresa and Kar, Oğuzhan and Cheerla, Nikhil and Suri, Rohan and Cao, Zhangjie and Malik, Jitendra and Guibas, Leonidas},
  journal={arXiv},
  year={2020}
}