Awesome

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

Abstract

This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to emphasize semantic regions in weakly supervised semantic segmentation. MCC adroitly incorporates concepts from masked image modeling and contrastive learning to devise Transformer blocks that induce keys to contract towards semantically pertinent regions. Unlike prevalent techniques that directly eradicate patch regions in the input image when generating masks, we scrutinize the neighborhood relations of patch tokens by exploring masks considering keys on the affinity matrix. Moreover, we generate positive and negative samples in contrastive learning by utilizing the masked local output and contrasting it with the global output. Elaborate experiments on commonly employed datasets evidences that the proposed MCC mechanism effectively aligns global and local perspectives within the image, attaining impressive performance.

Data Preparations

VOC dataset

1. Download

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar –xvf VOCtrainval_11-May-2012.tar

2. Download the augmented annotations

The augmented annotations can be downloaded from SBD dataset. After downloading SegmentationClassAug.zip, you should unzip it and move it to VOCdevkit/VOC2012.

VOCdevkit/
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    ├── SegmentationClassAug
    └── SegmentationObject

COCO dataset

1. Download

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip

Unzip and place train and validation images under VOC directory structure style.

MSCOCO/
├── annotations
├── JPEGImages
│    ├── train2014
│    └── val2014
└── SegmentationClass
     ├── train2014
     └── val2014

2. Generating VOC style segmentation labels for COCO

To generate VOC style segmentation labels for COCO dataset, use parse_coco.py.

python ./datasets/parse_coco.py --split train --year 2014 --to-voc12 false --coco-path $coco_path
python ./datasets/parse_coco.py --split val --year 2014 --to-voc12 false --coco-path $coco_path

Create environment

Clone this repo

git clone https://github.com/fwu11/mcc.git
cd mcc

Install the dependencies

conda create -n py38 python==3.8
conda activate py38
conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirement.txt

Build Reg Loss

To use the regularized loss, download and compile the python extension, see Here.

Create softlinks to the datasets

ln -s $your_dataset_path/VOCdevkit VOCdevkit
ln -s $your_dataset_path/MSCOCO MSCOCO

Train

## for VOC
CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node=4 --master_port=29501 scripts/dist_train_voc_seg_neg.py --work_dir work_dir_voc --spg 1
## for COCO
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -m torch.distributed.launch --nproc_per_node=8 --master_port=29501 scripts/dist_train_coco_seg_neg.py --work_dir work_dir_coco --spg 1

Evalution

## for VOC
python tools/infer_seg_voc.py --model_path $model_path --backbone vit_base_patch16_224 --infer val
## for COCO
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -m torch.distributed.launch --nproc_per_node=8 --master_port=29501 tools/infer_seg_voc.py --model_path $model_path --backbone vit_base_patch16_224 --infer val

Results

Here we report the performance on VOC and COCO dataset. MS+CRF denotes multi-scale test and CRF processing.

Dataset	Backbone	val	Log	Weights	val (with MS+CRF)	test (with MS+CRF)
VOC	DeiT-B	68.8	log	weights	70.3	71.2
COCO	DeiT-B	41.1	log	weights	42.3	--

Citation

@inproceedings{wu2024masked,
  title={Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation},
  author={Wu, Fangwen and He, Jingxuan and Yin, Yufei and Hao, Yanbin and Huang, Gang and Cheng, Lechao},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={862--871},
  year={2024}
}

Acknowledgement

Our code is developed based on ToCo. Also, we use the Regularized Loss and DenseCRF. We appreciate their great work.