Home

Awesome

DeepLabv3Plus-Pytorch

Pretrained DeepLabv3, DeepLabv3+ for Pascal VOC & Cityscapes.

Quick Start

1. Available Architectures

DeepLabV3DeepLabV3+
deeplabv3_resnet50deeplabv3plus_resnet50
deeplabv3_resnet101deeplabv3plus_resnet101
deeplabv3_mobilenetdeeplabv3plus_mobilenet
deeplabv3_hrnetv2_48deeplabv3plus_hrnetv2_48
deeplabv3_hrnetv2_32deeplabv3plus_hrnetv2_32
deeplabv3_xceptiondeeplabv3plus_xception

please refer to network/modeling.py for all model entries.

Download pretrained models: Dropbox, Tencent Weiyun

Note: The HRNet backbone was contributed by @timothylimyl. A pre-trained backbone is available at google drive.

2. Load the pretrained model:

model = network.modeling.__dict__[MODEL_NAME](num_classes=NUM_CLASSES, output_stride=OUTPUT_SRTIDE)
model.load_state_dict( torch.load( PATH_TO_PTH )['model_state']  )

3. Visualize segmentation outputs:

outputs = model(images)
preds = outputs.max(1)[1].detach().cpu().numpy()
colorized_preds = val_dst.decode_target(preds).astype('uint8') # To RGB images, (N, H, W, 3), ranged 0~255, numpy array
# Do whatever you like here with the colorized segmentation maps
colorized_preds = Image.fromarray(colorized_preds[0]) # to PIL Image

4. Atrous Separable Convolution

Note: All pre-trained models in this repo were trained without atrous separable convolution.

Atrous Separable Convolution is supported in this repo. We provide a simple tool network.convert_to_separable_conv to convert nn.Conv2d to AtrousSeparableConvolution. Please run main.py with '--separable_conv' if it is required. See 'main.py' and 'network/_deeplab.py' for more details.

5. Prediction

Single image:

python predict.py --input datasets/data/cityscapes/leftImg8bit/train/bremen/bremen_000000_000019_leftImg8bit.png  --dataset cityscapes --model deeplabv3plus_mobilenet --ckpt checkpoints/best_deeplabv3plus_mobilenet_cityscapes_os16.pth --save_val_results_to test_results

Image folder:

python predict.py --input datasets/data/cityscapes/leftImg8bit/train/bremen  --dataset cityscapes --model deeplabv3plus_mobilenet --ckpt checkpoints/best_deeplabv3plus_mobilenet_cityscapes_os16.pth --save_val_results_to test_results

6. New backbones

Please refer to this commit (Xception) for more details about how to add new backbones.

7. New datasets

You can train deeplab models on your own datasets. Your torch.utils.data.Dataset should provide a decoding method that transforms your predictions to colorized images, just like the VOC Dataset:


class MyDataset(data.Dataset):
    ...
    @classmethod
    def decode_target(cls, mask):
        """decode semantic mask to RGB image"""
        return cls.cmap[mask]

Results

1. Performance on Pascal VOC2012 Aug (21 classes, 513 x 513)

Training: 513x513 random crop
validation: 513x513 center crop

ModelBatch SizeFLOPstrain/val OSmIoUDropboxTencent Weiyun
DeepLabV3-MobileNet166.0G16/160.701DownloadDownload
DeepLabV3-ResNet501651.4G16/160.769DownloadDownload
DeepLabV3-ResNet1011672.1G16/160.773DownloadDownload
DeepLabV3Plus-MobileNet1617.0G16/160.711DownloadDownload
DeepLabV3Plus-ResNet501662.7G16/160.772DownloadDownload
DeepLabV3Plus-ResNet1011683.4G16/160.783DownloadDownload

2. Performance on Cityscapes (19 classes, 1024 x 2048)

Training: 768x768 random crop
validation: 1024x2048

ModelBatch SizeFLOPstrain/val OSmIoUDropboxTencent Weiyun
DeepLabV3Plus-MobileNet16135G16/160.721DownloadDownload
DeepLabV3Plus-ResNet10116N/A16/160.762DownloadN/A

Segmentation Results on Pascal VOC2012 (DeepLabv3Plus-MobileNet)

<div> <img src="samples/1_image.png" width="20%"> <img src="samples/1_target.png" width="20%"> <img src="samples/1_pred.png" width="20%"> <img src="samples/1_overlay.png" width="20%"> </div> <div> <img src="samples/23_image.png" width="20%"> <img src="samples/23_target.png" width="20%"> <img src="samples/23_pred.png" width="20%"> <img src="samples/23_overlay.png" width="20%"> </div> <div> <img src="samples/114_image.png" width="20%"> <img src="samples/114_target.png" width="20%"> <img src="samples/114_pred.png" width="20%"> <img src="samples/114_overlay.png" width="20%"> </div>

Segmentation Results on Cityscapes (DeepLabv3Plus-MobileNet)

<div> <img src="samples/city_1_target.png" width="45%"> <img src="samples/city_1_overlay.png" width="45%"> </div> <div> <img src="samples/city_6_target.png" width="45%"> <img src="samples/city_6_overlay.png" width="45%"> </div>

Visualization of training

trainvis

Pascal VOC

1. Requirements

pip install -r requirements.txt

2. Prepare Datasets

2.1 Standard Pascal VOC

You can run train.py with "--download" option to download and extract pascal voc dataset. The defaut path is './datasets/data':

/datasets
    /data
        /VOCdevkit 
            /VOC2012 
                /SegmentationClass
                /JPEGImages
                ...
            ...
        /VOCtrainval_11-May-2012.tar
        ...

2.2 Pascal VOC trainaug (Recommended!!)

See chapter 4 of [2]

    The original dataset contains 1464 (train), 1449 (val), and 1456 (test) pixel-level annotated images. We augment the dataset by the extra annotations provided by [76], resulting in 10582 (trainaug) training images. The performance is measured in terms of pixel intersection-over-union averaged across the 21 classes (mIOU).

./datasets/data/train_aug.txt includes the file names of 10582 trainaug images (val images are excluded). Please to download their labels from Dropbox or Tencent Weiyun. Those labels come from DrSleep's repo.

Extract trainaug labels (SegmentationClassAug) to the VOC2012 directory.

/datasets
    /data
        /VOCdevkit  
            /VOC2012
                /SegmentationClass
                /SegmentationClassAug  # <= the trainaug labels
                /JPEGImages
                ...
            ...
        /VOCtrainval_11-May-2012.tar
        ...

3. Training on Pascal VOC2012 Aug

3.1 Visualize training (Optional)

Start visdom sever for visualization. Please remove '--enable_vis' if visualization is not needed.

# Run visdom server on port 28333
visdom -port 28333

3.2 Training with OS=16

Run main.py with "--year 2012_aug" to train your model on Pascal VOC2012 Aug. You can also parallel your training on 4 GPUs with '--gpu_id 0,1,2,3'

Note: There is no SyncBN in this repo, so training with multple GPUs and small batch size may degrades the performance. See PyTorch-Encoding for more details about SyncBN

python main.py --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2012_aug --crop_val --lr 0.01 --crop_size 513 --batch_size 16 --output_stride 16

3.3 Continue training

Run main.py with '--continue_training' to restore the state_dict of optimizer and scheduler from YOUR_CKPT.

python main.py ... --ckpt YOUR_CKPT --continue_training

3.4. Testing

Results will be saved at ./results.

python main.py --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2012_aug --crop_val --lr 0.01 --crop_size 513 --batch_size 16 --output_stride 16 --ckpt checkpoints/best_deeplabv3plus_mobilenet_voc_os16.pth --test_only --save_val_results

Cityscapes

1. Download cityscapes and extract it to 'datasets/data/cityscapes'

/datasets
    /data
        /cityscapes
            /gtFine
            /leftImg8bit

2. Train your model on Cityscapes

python main.py --model deeplabv3plus_mobilenet --dataset cityscapes --enable_vis --vis_port 28333 --gpu_id 0  --lr 0.1  --crop_size 768 --batch_size 16 --output_stride 16 --data_root ./datasets/data/cityscapes 

Reference

[1] Rethinking Atrous Convolution for Semantic Image Segmentation

[2] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation