

CASTing Your Model: Learning to Localize Improves Self-supervised Representations

This is a PyTorch implementation of our CVPR'21 paper

The code is built on top of the MoCo Framework

  author  = {Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  title   = {Momentum Contrast for Unsupervised Visual Representation Learning},
  journal = {arXiv preprint arXiv:1911.05722},
  year    = {2019},


Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code.

Dataset setup

The code requires you to have a folder of train and val images under <ImageFolder> and precomputed saliency maps <MaskFolder>. We use the Salency maps from DeepUSPS (code found here). We provide saliency maps computed for COCO here

Pre-trained models

The CAST pretrained models trained on COCO can be found here

Unsupervised Training

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python main_cast.py -a resnet50 --cos  --lr 0.5   --batch-size 256   --dist-url 'tcp://localhost:10001' <ImageFolder> --mask-dir <MaskFolder>  --crit-gcam cosine --alpha-masked 3 --second-constraint "ref" --output-mask-region "ref" --num-gpus-per-machine 8  --print-freq 10 --workers 8

Imagenet Linear Classification

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:

python main_lincls.py \
  -a resnet50 \
  --lr 30.0 \
  --batch-size 256 \
  --pretrained [your checkpoint path]/checkpoint_200.pth \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

Code organization

main_cast.py contains the main code for our approach CAST. grad_cam.py contains functions that compute Grad-CAM maps for a specified layer. moco/datasets.py contains our Saliency Constrained Random Cropping data augmentation procedure. This uses functions from moco/augumentations/transforms.py and moco/augmentations/functional.py main_lincls.py contains code to evaluate our self-trained model on the downstream task of imagenet linear classification.


    title = {CASTing Your Model: Learning to Localize Improves Self-Supervised Representations},
    author = {Ramprasaath R. Selvaraju, Karan Desai, Justin Johnson, Nikhil Naik},
    booktitle = {CVPR},
    year = {2021}
