Home

Awesome

Self-Supervised Visual Representation Learning with Semantic Grouping

<p align="center"> <a href="https://proceedings.neurips.cc/paper_files/paper/2022/hash/6818dcc65fdf3cbd4b05770fb957803e-Abstract-Conference.html"><img src="https://img.shields.io/badge/-NeurIPS%202022-68488b"></a> <a href="https://arxiv.org/abs/2205.15288"><img src="https://img.shields.io/badge/arXiv-2205.15288-b31b1b"></a> <a href="https://wen-xin.info/slotcon"><img src="https://img.shields.io/badge/Project-Website-blue"></a> <a href="https://connecthkuhk-my.sharepoint.com/:f:/g/personal/xwen_connect_hku_hk/Etg2mBDKbFdPgO0W7CX5m94BAVqwX8XLhsLThlMXHIa8hg"><img src="https://img.shields.io/badge/ModelZoo-OneDrive-blue"></a> <a href="https://github.com/CVMI-Lab/SlotCon/blob/master/LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"></a> </p> <p align="center"> Self-Supervised Visual Representation Learning with Semantic Grouping (NeurIPS 2022)<br> By <a href="https://wen-xin.info">Xin Wen</a>, <a href="https://bzhao.me/">Bingchen Zhao</a>, <a href="https://dblp.org/pid/208/4164.html">Anlin Zheng</a>, <a href="https://scholar.google.com/citations?user=yuB-cfoAAAAJ">Xiangyu Zhang</a>, and <a href="https://xjqi.github.io/">Xiaojuan Qi</a>. </p>

Introduction

We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning. The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots. Based on the learned data-dependent slots, a contrastive objective is employed for representation learning, which enhances the discriminability of features, and conversely facilitates grouping semantically coherent pixels together.

framework

Compared with previous efforts, by simultaneously optimizing the two coupled objectives of semantic grouping and contrastive learning, our approach bypasses the disadvantages of hand-crafted priors and is able to learn object/group-level representations from scene-centric images. Experiments show our approach effectively decomposes complex scenes into semantic groups for feature learning and significantly benefits downstream tasks, including object detection, instance segmentation, and semantic segmentation.

Pretrained models

MethodDatasetEpochsArchAP<sup>b</sup>AP<sup>m</sup>Download
SlotConCOCO800ResNet-5041.037.0script | backbone only | full ckpt
SlotConCOCO+800ResNet-5041.837.8script | backbone only | full ckpt
SlotConImageNet-1K100ResNet-5041.437.2script | backbone only | full ckpt
SlotConImageNet-1K200ResNet-5041.837.8script | backbone only | full ckpt

Folder containing all the checkpoints: [link].

Getting started

Requirements

This project is developed with python==3.9 and pytorch==1.10.0, please be aware of possible code compatibility issues if you are using another version.

The following is an example of setting up the experimental environment:

conda create -n slotcon python=3.9 -y
conda activate slotcon
conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=11.3 -c pytorch
git clone https://github.com/CVMI-Lab/SlotCon && cd ./SlotCon
mkdir datasets
ln -s ${PATH_TO_COCO} ./datasets/coco
ln -s ${PATH_TO_IMAGENET} ./datasets/imagenet
pip install -r requirements.txt

Run pre-training

By default, we train with DDP over 8 GPUs on a single machine. The following are some examples of re-implementing SlotCon pre-training on COCO and ImageNet:

./scripts/slotcon_coco_r50_800ep.sh
./scripts/slotcon_cocoplus_r50_800ep.sh
./scripts/slotcon_imagenet_r50_100ep.sh

Evaluation: Object Detection & Instance Segmentation

Please install detectron2 and prepare the dataset first following the official instructions: [installation] [data preparation]

The following is an example usage of evaluating a pre-trained model on COCO:

mkdir transfer/detection/datasets
ln -s ${PATH_TO_COCO} transfer/detection/datasets/
python transfer/detection/convert_pretrain_to_d2.py output/${EXP_NAME}/ckpt_epoch_xxx.pth ${EXP_NAME}.pkl
cd transfer/detection &&
python train_net.py --config-file configs/COCO_R_50_FPN_1x_SlotCon.yaml --num-gpus 8 --resume MODEL.WEIGHTS ../../${EXP_NAME}.pkl OUTPUT_DIR ../../output/COCO_R_50_FPN_1x_${EXP_NAME}

Evaluation: Semantic Segmentation

Please install mmsegmentation and prepare the datasets first following the official instructions: [installation] [data preparation]

mkdir transfer/segmentation/data
ln -s ${PATH_TO_DATA} transfer/segmentation/data/
python transfer/segmentation/convert_pretrain_to_mm.py output/${EXP_NAME}/ckpt_epoch_xxx.pth ${EXP_NAME}.pth
# run pascal voc
cd transfer/segmentation &&
bash mim_dist_train.sh configs/voc12aug/fcn_d6_r50-d16_513x513_30k_voc12aug_moco.py ../../${EXP_NAME}.pth 2
# run cityscapes
cd transfer/segmentation &&
bash mim_dist_train.sh configs/cityscapes/fcn_d6_r50-d16_769x769_90k_cityscapes_moco.py ../../${EXP_NAME}.pth 2
# run ade20k
cd transfer/segmentation &&
bash mim_dist_train.sh configs/ade20k/fcn_r50-d8_512x512_80k_ade20k.py ../../${EXP_NAME}.pth 4

Prototype Visualization

We also provide the code for visualizing the learned prototypes' nearest neighbors. To run the following command, please prepare a full checkpoint.

python viz_slots.py \
    --data_dir ${PATH_TO_COCO} \
    --model_path ${PATH_TO_MODEL} \
    --save_path ${PATH_TO_SAVE} \
    --topk 5 \ # retrieve 5 nearest-neighbors for each prototype
    --sampling 20 # randomly sample 20 prototypes to visualize

concepts

Citing this work

If you find this repo useful for your research, please consider citing our paper:

@inproceedings{wen2022slotcon,
  title={Self-Supervised Visual Representation Learning with Semantic Grouping},
  author={Wen, Xin and Zhao, Bingchen and Zheng, Anlin and Zhang, Xiangyu and Qi, Xiaojuan},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgment

Our codebase builds upon several existing publicly available codes. Specifically, we have modified and integrated the following repos into this project:

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.