Awesome

MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection (CVPR2022)

This is the Pytorch implementation of our paper : MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection IEEE/CVF International Conference on Computer Vision (CVPR), 2022 [arXiv]

Installtion & Setup

We follow the installation precess of Unbiased Teacher official repo (https://github.com/facebookresearch/unbiased-teacher)

Download the code

For your convenience, we provide the code and model weights in zip

Prerequisites

Linux or macOS with Python ≥ 3.6
PyTorch ≥ 1.5 and torchvision that matches the PyTorch installation.

Build Detectron2 from Source

We find the latest(v0.6) package of Detectron2 occur the error with our code.
Therefore, please install the matched(v0.5) version of Detectron2 as follows:

# get the Detectron2 v0.5 package
wget https://github.com/facebookresearch/detectron2/archive/refs/tags/v0.5.zip

# unzip
unzip v0.5.zip

# install
python -m pip install -e detectron2-0.5

Install other requirements

pip install -r requirements.txt

Dataset download

Download COCO & VOC dataset
Organize the dataset as following:

mix-unmix/
└── datasets/
    ├── coco/
    │   ├── train2017/
    │   ├── val2017/
    │   └── annotations/
    │   	├── instances_train2017.json
    │   	└── instances_val2017.json
    ├── VOC2007
    │   ├── Annotations
    │   ├── ImageSets
    │   └── JPEGImages
    └── VOC2012
        ├── Annotations
        ├── ImageSets
        └── JPEGImages

Evaluation

Performance table and Model Weights (weight files are already included in zip file)

Backbone	Protocols	AP50	AP50:95	Model Weights
R50-FPN	COCO-Standard 1%	40.06	21.89	link
R50-FPN	COCO-Additional	63.30	42.11	link
R50-FPN	VOC07 (VOC12)	78.94	50.22	link
R50-FPN	VOC07 (VOC12 / COCO20cls)	80.45	52.31	link
Swin	COCO-Standard 0.5%	34.25	16.52	link

Run Evaluation w/ R50 in COCO

python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/coco.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth

Run Evaluation w/ R50 in VOC

python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/voc.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth

Train

We use 4 GPUs (A6000 or V100 32GB) to achieve the paper results.

Train the MUM under 1% COCO-supervision (ResNet-50)

python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/coco.yaml \

Train the MUM under VOC07 as labeled set and VOC12 as unlabeled set

python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/voc.yaml \

Swin

Download ImageNet pretrained weight of swin-t in link
mv pretrained weight to weights folder

mv swin_tiny_patch4_window7_224.pth weights/

Run Evaluation w/ Swin in COCO

python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/coco_swin.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth

Train under 0.5% COCO-supervision

python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/coco_swin.yaml \

Mix/UnMix code block

Mixing code block

Generate mix mask

mask = torch.argsort(torch.rand(bs // ng, ng, nt, nt), dim=1).cuda()
img_mask = mask.view(bs // ng, ng, 1, nt, nt)
img_mask = img_mask.repeat_interleave(3, dim=2)
img_mask = img_mask.repeat_interleave(h // nt, dim=3)
img_mask = img_mask.repeat_interleave(w // nt, dim=4)

Mixing image tiles

img_tiled = images.tensor.view(bs // ng, ng, c, h, w)
img_tiled = torch.gather(img_tiled, dim=1, index=img_mask)
img_tiled = img_tiled.view(bs, c, h, w)

Unmixing code block

Generate inverse mask to unmix

inv_mask = torch.argsort(mask, dim=1).cuda()
feat_mask = inv_mask.view(bs//ng,ng,1,nt,nt)
feat_mask = feat_mask.repeat_interleave(c,dim=2)
feat_mask = feat_mask.repeat_interleave(h//nt, dim=3)
feat_mask = feat_mask.repeat_interleave(w//nt, dim=4)

Unmixing feature tiles

feat_tiled = feat.view(bs//ng,ng,c,h,w)
feat_tiled = torch.gather(feat_tiled, dim=1, index=feat_mask)
feat_tiled = feat_tiled.view(bs,c,h,w)

Acknowledgements

We use Unbiased-teacher official code as our baseline. And also we use Timm repository to implement Swin Transformer easily.