Home

Awesome

SemFormer

The official code for SemFormer: Semantic Guided Activation Transformer for Weakly Supervised Semantic Segmentation.

Runtime Environment

Usage

Install python dependencies

python -m pip install -r requirements.txt

Download PASCAL VOC 2012 devkit

Follow instructions in http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit.

Train and evaluate the model.

1. Train SemFormer for generating CAMs

1.1 Train CAAE.

CUDA_VISIBLE_DEVICES=0,1 python train_caae.py --tag CAAE@DeiT-B-Dist

1.2 Train SemFormer.

CUDA_VISIBLE_DEVICES=0,1 python train_semformer.py --tag SemFormer@CAAE@DeiT-B-Dist

Or use the checkpoint we porvide in experiments/models/SemFormer@CAAE@DeiT-B-Dist.pth.

2. Inference SemFormer for generating CAMs

CUDA_VISIBLE_DEVICES=0 python inference_semformer.py --tag SemFormer@CAAE@DeiT-B-Dist --domain train_aug

Evaluate CAMs. [optinal]

python evaluate.py --experiment_name SemFormer@CAAE@DeiT-B-Dist@train@scale=0.5,1.0,1.5,2.0 --domain train

3. Apply Random Walk (RW) to refine the generated CAMs

2.1. Make affinity labels to train AffinityNet.

python make_affinity_labels.py --experiment_name SemFormer@CAAE@DeiT-B-Dist@train@scale=0.5,1.0,1.5,2.0 --domain train_aug

2.2. Train AffinityNet using the generated affinity labels.

CUDA_VISIBLE_DEVICES=0,1 python train_affinitynet.py --tag AffinityNet@SemFormer --label_name SemFormer@CAAE@DeiT-B-Dist@train@scale=0.5,1.0,1.5,2.0@aff_fg=0.11_bg=0.15

4. Make pseudo labels.

4.1 Inference random walk (affinitynet) to refine the generated CAMs.

CUDA_VISIBLE_DEVICES=0 python inference_rw.py --model_name AffinityNet@SemFormer --cam_dir SemFormer@CAAE@DeiT-B-Dist@train@scale=0.5,1.0,1.5,2.0 --domain train_aug

4.2 Apply CRF to generate pseudo labels.

python make_pseudo_labels.py --experiment_name AffinityNet@SemFormer@train@beta=10@exp_times=8@rw --domain train_aug --crf_iteration 1

5. Train and Evaluate the segmentation model using the pseudo labels

Please follow the instructions in this repo to train and evaluate the segmentation model.

6. Results

Qualitative segmentation results on PASCAL VOC 2012 (mIoU (%)). Supervision: pixel-level ($\mathcal{F}$), box-level ($\mathcal{B}$), saliency-level ($\mathcal{S}$), and image-level ($\mathcal{I}$).

MethodPublicationSupervisionvaltest
DeepLabV1ICLR'15$\mathcal{F}$68.771.6
DeepLabV2TPAMI'18$\mathcal{F}$77.779.7
BCMCVPR'19$\mathcal{I} + \mathcal{B}$70.2-
BBAMCVPR'21$\mathcal{I} + \mathcal{B}$73.773.7
ICDCVPR'20$\mathcal{I} + \mathcal{S}$67.868.0
EPSCVPR'21$\mathcal{I} + \mathcal{S}$71.071.8
BESECCV'20$\mathcal{I}$65.766.6
CONTANeurIPS'20$\mathcal{I}$66.166.7
AdvCAMCVPR'21$\mathcal{I}$68.168.0
OC-CSEICCV'21$\mathcal{I}$68.468.2
RIBNeurIPS'21$\mathcal{I}$68.368.6
CLIMSCVPR'22$\mathcal{I}$70.470.0
MCTFormerCVPR'22$\mathcal{I}$71.971.6
SemFormer (ours)-$\mathcal{I}$73.773.2

Acknowledgement

This repo is modified from Puzzle-CAM, thanks for their contribution to the community.