Home

Awesome

ViRB

ViRB is a framework for evaluating the quality of representations learned by visual encoders on a variety of downstream tasks. It is the codebase used by the paper Contrasting Contrastive Self-Supervised Representation Learning Pipelines. As this is a tool for evaluating the learned representations, it is designed to freeze the encoder weights and only train a small end task network using latent representations on the train set for each task and evaluate it on the test set for that task. To speed this process up, the train and test set are pre encoded for most of the end tasks and stored in GPU memory for efficient usage. Fine tuning the encoder is also supported but takes significantly more time. ViRB is fully implemented in pyTorch and automatically scales to as many GPUs as are available on your machine. It has support for evaluating any pyTorch model architecture on a select subset of tasks.

Installation

To install the codebase simply clone this repository from github and run setup:

git clone https://github.com/klemenkotar/ViRB
cd ViRB
pip install -r requirements.txt

Quick Start

For a quick starting example we will train an end task network on the simple CalTech classification task using the SWAV 800 encoder.

First we need to download the encoder:

mkdir pretrained_weights
wget https://prior-model-weights.s3.us-east-2.amazonaws.com/contrastive_encoders/SWAV_800.pt 
mv SWAV_800.pt pretrained_weights/

Then we need to download the CalTech dataset from here. After extracting it you should have a directory named 101_ObjectCategories. Rename it to data/caltech/.

Now we are ready to start the training run with the following command:

python main.py --experiment_list=configs/experiment_lists/swav.yaml --virb_configs=configs/virb_configs/caltech.yaml

The codebase will automatically use a GPU if one is available on the machine. The progress will be printed on the screen along with an ETA for completion.

Live tensorboard logs can be acessed by running the following command:

tensorboard --logdir=out

Once the training is complete the task head model and results json file will be stored in the out/ directory.

Dataset Download

To run the full suit of end tasks we need to download all the associated datasets. All the datasets should be stored in a folder called data/ inside the root project directory. Bellow is a table of links where the data can be downloaded and the names of directories they should be placed in.

Due to the complex nature and diversity of dataset licensing we provide 4 types of links: Data which is a direct link to a compressed file that can be downloaded from the internet, Website which is a link to a website where some instructions can be followed to download the data in question, JSON which is a link to a supplementary JSON file which adds some metadata on top of another existing dataset and txt which contain lists of resources that need to be downloaded.

Dataset NameDataset SizeDirectoryDownload LinkSizeNote
ImageNet Cls.1,281,167data/imagenet/Website126.2 GB
Pets Cls.3,680data/pets/Data0.82 GB
CalTech Cls.3,060data/caltech-101/Data0.14 GB
CIFAR-100 Cls.50,000data/cifar-100/Data0.19 GB
SUN Scene Cls.87,003data/SUN397/Data38.0 GB
Eurosat Cls.21,600data/eurosat/Data0.1 GB
dtd Cls.3,760data/dtd/Data0.63 GB
Kinetics Action Pred.50,000data/kinetics400/Website0.63 GB
CLEVR Count70,000data/CLEVR/Data20.0 GB
THOR Num. Steps60,000data/thor_num_steps/Data0.66 GB
THOR Egomotion60,000data/thor_action_prediction/Data1.3 GB
nuScenes Egomotion28,000data/nuScenes/Website JSON JSON53.43 GBDownload samples and sweeps
Cityscapes Seg.3,475data/cityscapes/Website61.89 GB
Pets Instance Seg.3,680data/pets/Data Masks0.82 GB
EgoHands Seg.4,800data/egohands/Data1.35 GB
THOR Depth60,000data/thor_depth_prediction/Data0.25 GB
Taskonomy Depth39,995data/taskonomy/Link txt48.09 GBDownload the rgb and depth_zbuffer data for the scenes listed in txt
NYU Depth1,159data/nyu/Data5.62 GBSame data as NYU Walkable
NYU Walkable1,159data/nyu/Data5.62 GBSame data as NYU Walkable
KITTI Opt. Flow200data/KITTI/Data1.68 GB

Pre-trained Models

As part of our paper we trained several new encoders using a combination of training algorithms and datasets. Bellow is a table containing the download links to the weights. The weights are stored in standard pyTorch format. To work with this codebase, the models should be downloaded into a directory called pretrained_weights/ inside the root project directory.

Encoder NameMethodDatasetDataset SizeNumber of EpochsLink
SwAV ImageNet 100SwAVImageNet1.3M100Link
SwAV ImageNet 50SwAVImageNet1.3M50Link
SwAV Half ImageNet 200SwAVImageNet-1/20.5M200Link
SwAV Half ImageNet 100SwAVImageNet-1/20.5M100Link
SwAV Quarter ImageNet 200SwAVImageNet-1/40.25M200Link
SwAV Linear Unbalanced ImageNet 200SwAVImageNet-1/2-Lin0.5M200Link
SwAV Linear Unbalanced ImageNet 100SwAVImageNet-1/2-Lin0.5M100Link
SwAV Log Unbalanced ImageNet 200SwAVImageNet-1/4-Log0.25M200Link
SwAV Places 200SwAVPlaces1.3M200Link
SwAV Kinetics 200SwAVKinetics1.3M200Link
SwAV Taskonomy 200SwAVTaskonomy1.3M200Link
SwAV Combination 200SwAVCombination1.3M200Link
MoCov2 ImageNet 100MoCov2ImageNet1.3MYesLink
MoCov2 ImageNet 50MoCov2ImageNet1.3M50Link
MoCov2 Half ImageNet 200MoCov2ImageNet-1/20.5M200Link
MoCov2 Half ImageNet 100MoCov2ImageNet-1/20.5M100Link
MoCov2 Quarter ImageNet 200MoCov2ImageNet-1/40.25M200Link
MoCov2 Linear Unbalanced ImageNet 200MoCov2ImageNet-1/2-Lin0.5M200Link
MoCov2 Linear Unbalanced ImageNet 100MoCov2ImageNet-1/2-Lin0.5M100Link
MoCov2 Log Unbalanced ImageNet 200MoCov2ImageNet-1/4-Log0.25M200Link
MoCov2 Places 200MoCov2Places1.3M200Link
MoCov2 Kinetics 200MoCov2Kinetics1.3M200Link
MoCov2 Taskonomy 200MoCov2Taskonomy1.3M200Link
MoCov2 Combination 200MoCov2Combination1.3M200Link

We also used some models trained by third party authors. Bellow is a table of download links for their models and the scripts used to convert the weights from their format to ViRB format. All of the conversion scripts have the exact same usage: <SCRIPT_NAME> <DOWNLOADED_WEIGHT_FILE> <DESIRED_VIRB_FORMAT_OUTPUT_PATH>.

Encoder NameMethodDatasetDataset SizeNumber of EpochsLinkConversion Script
SwAV ImageNet 800SwAVImageNet1.3M800Linkscripts/swav_to_virb.py
SwAV ImageNet 200SwAVImageNet1.3M200Linkscripts/swav_to_virb.py
MoCov1 ImageNet 200MoCov1ImageNet1.3M200Linkscripts/moco_to_virb.py
MoCov2 ImageNet 800MoCov2ImageNet1.3M800Linkscripts/moco_to_virb.py
MoCov2 ImageNet 200MoCov2ImageNet1.3M200Linkscripts/moco_to_virb.py
PIRL ImageNet 800PIRLImageNet1.3M800Linkscripts/pirl_to_virb.py

End Task Training

ViRB supports 20 end task that are classified as Image-level or Pixelwise depending on the output modality of the task. Furthermore each task is also classified as either semantic or structural. Bellow is an illustration of the space of our tasks. For further details please see Contrasting Contrastive Self-Supervised Representation Learning Models.

Tasks

After installing the codebase and downloading the datasets and pretrained models we are ready to run our experiments. To reproduce every experiment in the paper run:

python main.py --experiment_list=configs/experiment_lists/all.yaml --virb_configs=configs/virb_configs/all.yaml

WARNING: this will take well over 1000 GPU hours to train so we suggest training a subset instead. We can see the results of all these training runs summarized in the graph bellow.

Results Correlation of end task performances with ImageNet classification accuracy. The plots show the end task performance against the ImageNet top-1 accuracy for all end tasks and encoders. Each point represents a different encoder trained with different algorithms and datasets. This reveals the lack of a strong correlation between the performance on ImageNet classification and tasks from other categories.

To specify which task we want to train we create a virb_config yaml file which defines the task name and training configuration. The file configs/virb_configs/all.yaml contains configurations for every task supported by this package so it is a good starting point. We can select only a few tasks to train and comment out the other configurations.

To specify which weights we want to use we specify an experiment list file. The file configs/experiment_lists/all.yaml contains all the model weights provided by this repository. We can select only a few models to train and comment out the other configurations. Alternatively we can add in new weights and add them to the list. All we have to do is make sure the weights are for a ResNet50 model stored in the standard pyTorch weight file.

Training a SWAV Encoder on the ImageNet End Task

To train a model using the SWAV encoder on the ImageNet classification end task download the ImageNet dataset from the link in the Dataset Download table above, and the SWAV Imagenet 800 model from the Pretrained-Models table above.

Then create a new file inside configs/virb_configs/ that contains just the ImageNet configuration:

Imagenet:
 task: "Imagenet"
 training_configs:
   adam-0.0001:
     optimizer: "adam"
     lr: 0.0001
 num_epochs: 100
 batch_size: 32

Then create a new file inside configs/experiment_lists/ that contains just the SWAV model:

SWAV_800: 'pretrained_weights/SWAV_800.pt'

Now run this configuration with the following command:

python main.py --experiment_list=configs/experiment_lists/EXPERIMENT_LIST_FILE_NAME.yaml --virb_configs=configs/virb_configs/VIRB_CONFIG_FILE_NAME.yaml

Hyperparameter Search

One feature offered by this codebase is the ability to train the end task networks using several sets of optimizers, schedulers and hyperparameters. For the Image-level tasks (which are encodable), the dataset will get encoded only once and then a model using each set of hyperparameters will get trained (to improve efficiency).

An example of a grid search configuration can be found in configs/virb_configs/imagenet_grid_search.yaml, and it looks like this:

Imagenet:
 task: "Imagenet"
 training_configs:
   adam-0.0001:
     optimizer: "adam"
     lr: 0.0001
   adam-0.001:
     optimizer: "adam"
     lr: 0.001
   sgd-0.01-StepLR:
     optimizer: "sgd"
     lr: 0.01
     scheduler:
       type: "StepLR"
       step_size: 50
       gamma: 0.1
   sgd-0.01-OneCycle:
     optimizer: "sgd"
     lr: 0.01
     scheduler:
       type: "OneCycle"
   sgd-0.01-Poly:
     optimizer: "sgd"
     lr: 0.001
     scheduler:
       type: "Poly"
       exponent: 0.9
 num_epochs: 100
 batch_size: 32

We spoecify each training config as a YAML object. The "sgd" and "adam" optimizers are supported as well as the "StepLR", "OneCycle" and "Poly" schedulers from pyTorch's optim package. All schedulers are compatible with all of the optimizers.

To execute this ImageNet grid search run:

python main.py --experiment_list=configs/experiment_lists/swav.yaml --virb_configs=configs/virb_configs/imagenet_grid_search.yaml

Testing Only Datasets

One aditional feature this codebase supports is datasets that are "eval only" and use a task head trained on a different task. The only currently supported example is ImageNet v2. To test the SWAV 800 model on ImageNetv2 first train at least one ImageNet end task head on SWAV 800 then run the following command:

python main.py --experiment_list=configs/experiment_lists/swav.yaml --virb_configs=configs/virb_configs/imagenetv2.yaml

Custom Models

All the encoders in the tutorials thus far have used the ResNet50 architecture, but we also support using custom encoders.

All of the Image-level tasks require the encoder outputs a dictionary with the key "embedding" mapping to a pyTorch tensor of size NxD where N is the batch size and D is the arbitrary embedding size.

All of the Pixelwise tasks require that the encoders output a dictionary with a tensor for the representation after every block. In practice this means that the model needs to output 5 tensors of sizes corresponding to the outputs of a ResNet50 conv, block1, block2, block3 and block4 layers.

To use a custom model simply modify main.py by replacing ResNet50Encoder with any encoder with the outputs mentioned above.

Citation

@inproceedings{kotar2021contrasting,
  title={Contrasting Contrastive Self-Supervised Representation Learning Pipelines},
  author={Klemen Kotar and Gabriel Ilharco and Ludwig Schmidt and Kiana Ehsani and Roozbeh Mottaghi},
  booktitle={ICCV},  
  year={2021},
}