Awesome

NamedMask: Distilling Segmenters from Complementary Foundation Models

Official PyTorch implementation for NamedMask. Details can be found in the paper. [Paper] [Project page]

Alt Text

Preparation
NamedMask training/inference
Pre-trained weights
Citation
Acknowledgements

Preparation

1. Download datasets

Please download datasets of interest first by visiting the following links:

Cityscapes
CoCA
COCO2017
VOC2012
(Optional) ImageNet2012 (for an index dataset used in training)

It is worth noting that Cityscapes and ImageNet2012 require you to sign up an account. In addition, you need to download ImageNet2012 if you want to train NamedMask yourself.

We advise you to put the downloaded dataset(s) into the following directory structure for ease of implementation:

{your_dataset_directory}
├──cityscapes
│  ├──gtFine
│  ├──leftImg8bit
├──coca
│  ├──binary
│  ├──image
├──coco2017
│  ├──annotations
│  ├──train2017
│  ├──val2017
├──ImageNet2012
│  ├──train
│  ├──val
├──ImageNet-S
│  ├──ImageNetS50
│  ├──ImageNetS300
│  ├──ImageNetS919
├──VOCdevkit
   ├──VOC2012

2. Download required python packages:

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c conda-forge tqdm
conda install -c conda-forge matplotlib
conda install -c anaconda ujson
conda install -c conda-forge pyyaml
conda install -c conda-forge pycocotools 
conda install -c anaconda scipy
pip install opencv-python
pip install git+https://github.com/openai/CLIP.git

Please note that a required version of each package might vary depending on your local device.

NamedMask training/inference

NamedMask is trained with pseudo-labels from either an unsupervised saliency detector (e.g., SelfMask) or category experts which refines the predictions made by the saliency network. For this reason, we need to generate pseudo-labels before training NamedMask. You can skip this part if you only want to do inference with pre-trained weights provided below.

1. Generate pseudo-labels

To compute pseudo-masks for images of the categories in Cityscapes, COCO2017, CoCA, or VOC2012, we provide for each benchmark a dictionary file (.json format) which maps a category to a list of 500 ImageNet2012 image paths which are retrieved by CLIP (with ViT-L/14@336px architecture). This file has the following structure:

{
    "category_a": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
    "category_b": ["{your_imagenet_dir}/train/xxx.JPEG", ..., "{your_imagenet_dir}/train/xxx.JPEG"],
    ...
}

You need to change {your_imagenet_dir} before loading this file for the following steps (by default, it's set to /home/cs-shin1/datasets/ImageNet2012).

Please download a dictionary file for a benchmark on which you want to evaluate and put it in the ImageNet2012 directory:

Then, open selfmask.sh in scripts directory and change

DIR_ROOT={your_working_directory}
DIR_DATASET={your_ImageNet2012_directory}
CATEGORY_TO_P_IMAGES_FP={your_category_to_p_images_fp}  # this should point to a json file you downloaded above

Run,

bash selfmask.sh

This will generate pseudo-masks for images retrieved by CLIP (with ViT-L/14@336px architecture) from the ImageNet2012 training set. The pseudo-masks will be saved at {your_ImageNet2012_directory}/train_pseudo_masks_selfmask.

If you want to skip this process, please download the pre-computed pseudo-masks and uncompress it in {your_ImageNet2012_directory}/train_pseudo_masks_selfmask:

pseudo-masks from SelfMask (~89 MB)

Optionally, if you want to refine pseudo-masks with a category expert (after finishing the above step), check out expert_$DATASET_NAME_category.sh file and configure DIR_ROOT, CATEGORY_TO_P_IMAGES_FP and CATEGORY_TO_P_IMAGES_FP as appropriate. Then,

bash expert_$DATASET_NAME_category.sh

Currently, we only provide code for training experts of the VOC2012 categories. The pseudo-masks will be saved at {your_ImageNet2012_directory}/train_pseudo_masks_experts.

If you want to skip this process, please download the pre-computed pseudo-masks:

Cityscapes pseudo-masks from category experts (~ 6.5 MB)
CoCA pseudo-masks from category experts (~ 36 MB)
COCO2017 pseudo-masks from category experts (~ 36 MB)
VOC2012 pseudo-masks from category experts (~ 11 MB)

Please uncompress .zip file in {your_ImageNet2012_directory}/train_pseudo_masks_experts.

2. Training

Once pseudo-masks are created (or downloaded and uncompressed), set a path to the directory that contains the pseudo-masks in a configuration file. For example, to train a model with pseudo-masks from experts for the VOC2012 categories, open the voc_val_n500_cp2_ex.yaml file and change

category_to_p_images_fp: {your_category_to_p_images_fp}  # this should point to a json file you downloaded above
dir_ckpt: {your_dir_ckpt}  # this should point to a checkpoint directory
dir_train_dataset: {your_dir_train_dataset}  # this should point to ImageNet2012 directory (as an index dataset)
dir_val_dataset: {your_dir_val_dataset}  # this should point to a benchmark directory

arguments as appropriate.

Then, run

bash voc_val_n500_cp2_sr10100_ex.sh

It is worth noting that an evaluation will be made at every certain iterations during training and the final weights will be saved at your checkpoint directory.

3. Inference

To evaluate a model with pre-trained weights on a benchmark, e.g., VOC2012, please run (after customising the four arguments above)

bash voc_val_n500_cp2_sr10100_ex.sh $PATH_TO_WEIGHTS

Pre-trained weights

We provide the pre-trained weights of NamedMask:

benchmark	split	IoU (%)	pixel accuracy (%)	link
Cityscapes (object)	val	18.2	93.0	weights (~102 MB)
COCA	-	27.4	82.0	weights (~102 MB)
COCO2017	val	27.7	76.4	weights (~102 MB)
ImageNet-S50	test	47.5	-	weights (~102 MB)
ImageNet-S300	test	33.1	-	weights (~103 MB)
ImageNet-S919	test	23.1	-	weights (~103 MB)
VOC2012	val	59.3	89.2	weights (~102 MB)

We additionally offer the pre-trained weights of the category experts for 20 classes in VOC2012:

category	link
aeroplane	weights (~102 MB)
bicycle	weights (~102 MB)
bird	weights (~102 MB)
boat	weights (~102 MB)
bottle	weights (~102 MB)
bus	weights (~102 MB)
car	weights (~102 MB)
cat	weights (~102 MB)
chair	weights (~102 MB)
cow	weights (~102 MB)
dining table	weights (~102 MB)
dog	weights (~102 MB)
horse	weights (~102 MB)
motorbike	weights (~102 MB)
person	weights (~102 MB)
potted plant	weights (~102 MB)
sheep	weights (~102 MB)
sofa	weights (~102 MB)
train	weights (~102 MB)
tv/monitor	weights (~102 MB)

Citation

@article{shin2022namedmask,
  author = {Shin, Gyungin and Xie, Weidi and Albanie, Samuel},
  title = {NamedMask: Distilling Segmenters from Complementary Foundation Models},
  journal = {arXiv:},
  year = {2022}
}

Acknowledgements

We borrowed the code for SelfMask and DeepLabv3+ from

If you have any questions about our code/implementation, please contact us at gyungin [at] robots [dot] ox [dot] ac [dot] uk.