Awesome

News

We express our thanks to Oxford and Imperial College London researchers, who showed how our PI-Dropout can be used for large gains over the state-of-the-art in reinforcement learning in their recent paper at the Beyond tabula rasa in RL workshop at ICLR 2020.

dlupi-heteroscedastic-dropout

Deep Learning under Privileged Information Using Heteroscedastic Dropout (CVPR 2018, Official Repo)

This is the code for the paper:

Deep Learning Under Privileged Information Using Heteroscedastic Dropout <br> John Lambert*, Ozan Sener*, Silvio Savarese <br> Presented at CVPR 2018

The paper can be found on ArXiv here.

This repository also includes an implementation for repeatable random data augmentation transformations, useful for transforming images and bounding boxes contained therein identically.

The DLUPI models used in the paper
Code for training new feedforward CNN models
Code for training new feedforward RNN models

If you find this code useful for your research, please cite

@InProceedings{Lambert_2018_CVPR,
author = {Lambert, John and Sener, Ozan and Savarese, Silvio},
title = {Deep Learning Under Privileged Information Using Heteroscedastic Dropout},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

In this repository we provide:

Top-k Multi-crop testing framework
Top-k Single-crop testing framework
Reproducible (repeatable) random image transformations
Curriculum learning examples in PyTorch
Base and derived class examples with virtual functions in Python

We also provide implementations of various baselines that use privileged information, including:

J. Hoffman, S. Gupta, and T. Darrell. Learning with Side Information through Modality Hallucination. In CVPR, 2016.
Y. Chen, X. Jin, J. Feng, and S. Yan. Training Group Orthogonal Neural Networks with Privileged Information. In IJCAI, 2017. Pages 1532-1538. https://doi.org/10.24963/ijcai.2017/212.
H. Yang, J. Zhou, J. Cai, and Y.S. Ong. MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks With Privileged Information. In CVPR, 2017.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. In JMLR, 2014. Pages 1929−1958.
A. Achille, S. Soatto. Information Dropout: learning optimal representations through noisy computation. Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2018.
K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR, 2015.

Setup

All code is implemented in PyTorch.

First install PyTorch, torchvision, and CUDA, then update / install the following packages:

(with Conda and Python 2.7 on Linux the instructions here will look something like)

conda install pytorch torchvision -c pytorch

(Optional) GPU Acceleration

If you have an NVIDIA GPU, you can accelerate all operations with CUDA.

First install CUDA.

(Optional) cuDNN

When using CUDA, you can use cuDNN to accelerate convolutions.

First download cuDNN and copy the libraries to /usr/local/cuda/lib64/.

Download ImageNet CLS-LOC

First,register and create an ImageNet account.

Next, download the 1.28 Million images

Now, we need to download the XML bounding box annotations, either via the link here (42.8 MB in size). or via command line

wget http://image-net.org/Annotation/Annotation.tar.gz

The XML annotations are stored in recursive tar.gz files. They can be recursively unzipped via tar, which will take around 10 minutes on a typical workstation:

mkdir bbox_annotation
tar -xvzf Annotation.tar.gz -C bbox_annotation
rm Annotation.tar.gz
cd bbox_annotation
for a in `ls -1 *.tar.gz`; do gzip -dc $a | tar xf -; done
rm *.tar.gz

Now, we have a directory called bbox_annotation/Annotation that contains .xml files with bounding box information for 3,627 classes ("synsets") of ImageNet. We will use only the 1000 classes featured in the ImageNet Large-Scale Visual Recognitiion Challenge (ILSVRC) task.

At this point, we'll arrange the image data into three folders: "train", "val", and "test".

6.3G val.zip 56G train.zip

On the ILSVRC 2016 page on the ImageNet website, find and download the file named

ILSVRC2016_CLS-LOC.tar.gz

This is the Classification-Localization dataset (155GB),unchanged since ILSVRC2012. There are a total of 1,281,167 images for training. The number of images for each synset (category) ranges from 732 to 1300. There are 50,000 validation images, with 50 images per synset. There are 100,000 test images. All images are in JPEG format.

It is arranged as follows: {split}/{synset_name}/{file_name}.JPEG

For example, ImageNet_2012/train/n02500267/02500267_2597.JPEG

We will use the bounding box subset of the images from CLS-LOC (that have bounding box information). We'll then use subsets of the images with annotated bounding boxes to evaluate sample efficiency. Run:

mkdir ImageNetLocalization
python cnns/imagenet/create_bbox_dataset.py
python cnns/imagenet/create_imagenet_test_set.py

Training CNN Models From Scratch

The script train.py lets you train a new CNN model from scratch.

python cnns/train/train.py

By default this script runs on GPU; to run on CPU, remove the .cuda() lines within the code.

License

Free for personal or research use; for commercial use please contact me.

Pretrained RNN Models

Pretrained CNN Models

File explaining some of the model names: https://docs.google.com/document/d/1KBjYK52Jvcd8cYpIZPPRUrXGSu6jFsBL5O3FwvtBr_Q/edit?usp=sharing

X* 75K, ADAPTIVE DECAY, Model link

 { 'model_type' : ModelType.DROPOUT_FN_OF_XSTAR,
   'model_fpath': '/vision/group/ImageNetLocalization/saved_imagenet_models/ImageNet_Localization_1000_Class_75perclass_identity_ATtest_lambda100_BS64'
 },

Modal. Hallucination shared params RGB MASK, Model link

 { 'model_type' : ModelType.MODALITY_HALLUC_SHARED_PARAMS,
   'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_25_05_46_34_num_ex_per_cls_75_bs_128_optimizer_type_sgd_model_type_ModelType.MODALITY_HALLUC_SHARED_PARAMS_lr_0.01_fixlrsched_False'},

MIML-FCN/VGG RGB MASK Model link

 { 'model_type' : ModelType.MIML_FCN_VGG,
   'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_25_05_31_49_num_ex_per_cls_75_bs_128_optimizer_type_sgd_model_type_ModelType.MIML_FCN_VGG_lr_0.01_fixlrsched_False'},

MIML-FCN [40]/ResNet RGB MASK Model link

 { 'model_type' : ModelType.MIML_FCN_RESNET,
   'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_25_02_59_54_num_ex_per_cls_75_bs_256_optimizer_type_sgd_model_type_ModelType.MIML_FCN_RESNET_lr_0.1_fixlrsched_False'},

GoCNN, VGG, scale coeff down by 320 Model link

 { 'model_type' : ModelType.GO_CNN_VGG,
   'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_24_04_20_02_num_ex_per_cls_75_bs_256_optimizer_type_adam_model_type_ModelType.GO_CNN_VGG_lr_0.001_fixlrsched_False'},

Random Gaussian Dropout Model link

 { 'model_type' : ModelType.DROPOUT_RANDOM_GAUSSIAN_NOISE,
   'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_22_07_28_56_num_ex_per_cls_75_bs_256_optimizer_type_sgd_model_type_ModelType.DROPOUT_RANDOM_GAUSSIAN_NOISE_lr_0.01_fixlrsched_False'},

NO X* 75k adaptive decay, bs = 256 , Model link

 { 'model_type' : ModelType.DROPOUT_BERNOULLI,
   'model_fpath': '/vision/group/ImageNetLocalization/saved_SGDM_imagenet_models/2018_03_07_13_20_12_num_ex_per_cls_75_bs_256_optimizer_type_sgd_dropout_type_bernoulli_lr_0.01_fixlrsched_False'},