Home

Awesome

AdapNet++: Self-Supervised Model Adaptation for Multimodal Semantic Segmentation

PWC PWC PWC PWC PWC

AdapNet++ is a compact state-of-the-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., car, road, tree and so on) to every pixel in the input image. AdapNet++ is easily trainable on a single GPU with 12 GB of memory and has a fast inference time. AdapNet++ is benchmarked on Cityscapes, Synthia, ScanNet, SUN RGB-D and Freiburg Forest datasets.

This repository contains our TensorFlow implementation of AdapNet++, which allows you to train your own model on any dataset and evaluate results in terms of the mean IoU metric.

Adapnet++ can further be used with SSMA or CMoDE fusion schemes for multimodal semantic segmentation.

If you find the code useful for your research, please consider citing our paper:

@article{valada19ijcv,
         author = {Valada, Abhinav and Mohan, Rohit and Burgard, Wolfram},
         title = {Self-Supervised Model Adaptation for Multimodal Semantic Segmentation},
         journal = {International Journal of Computer Vision (IJCV)},
         year = {2019},
         month = {jul},
         doi = {10.1007/s11263-019-01188-y},
         note = {Special Issue: Deep Learning for Robotic Vision},
         issn = {1573-1405},
         day = {08}}
}

Live Demo

http://deepscene.cs.uni-freiburg.de

Example Segmentation Results

DatasetRGB ImageSegmented Image
Cityscapes<img src="images/city2.png" width=300><img src="images/city2_pred_v2.png" width=300>
Forest<img src="images/forest2.png" width=300><img src="images/forest2_pred_v2.png" width=300>
Sun RGB-D<img src="images/sun1.png" width=300><img src="images/sun1_pred_v2.png" width=300>
Synthia<img src="images/synthia1.png" width=300><img src="images/synthia1_pred_v2.png" width=300>
ScanNet v2<img src="images/scannet1.png" width=300><img src="images/scannet1_pred_v2.png" width=300>

Contacts

System Requirements

Programming Language

Python 2.7

Python Packages

tensorflow-gpu 1.4.0

Configure the Network

Data

Run the convert_to_tfrecords.py from dataset folder for each of the train, test, val sets to create the tfrecords:

   python convert_to_tfrecords.py --file path_to_.txt_file --record tf_records_name 

(Input to model is in BGR and 'NHWC' form)

Training Params

    gpu_id: id of gpu to be used
    model: name of the model
    num_classes: number of classes (including void, label id:0)
    intialize:  path to pre-trained model
    checkpoint: path to save model
    train_data: path to dataset .tfrecords
    batch_size: training batch size
    skip_step: how many steps to print loss 
    height: height of input image
    width: width of input image
    max_iteration: how many iterations to train
    learning_rate: initial learning rate
    save_step: how many steps to save the model
    power: parameter for poly learning rate

Evaluation Params

    gpu_id: id of gpu to be used
    model: name of the model
    num_classes: number of classes (including void, label id:0)
    checkpoint: path to saved model
    test_data: path to dataset .tfrecords
    batch_size: evaluation batch size
    skip_step: how many steps to print mIoU
    height: height of input image
    width: width of input image

Training and Evaluation

Training Procedure

Edit the config file for training in config folder. Run:

python train.py -c config\cityscapes_train.config or python train.py --config config\cityscapes_train.config

Evaluation Procedure

Select a checkpoint to test/validate your model in terms of the mean IoU metric. Edit the config file for evaluation in config folder. Run:

python evaluate.py -c config cityscapes_test.config or python evaluate.py --config cityscapes_test.config

Models

Cityscapes (void + 11 classes)

ModalitymIoU
RGB80.77
Depth65.01
HHA67.63

Synthia (void + 11 classes)

ModalitymIoU
RGB86.68
Depth87.87

SUN RGB-D (void + 37 classes)

ModalitymIoU
RGB37.98
Depth34.28
HHA34.59

ScanNet v2 (void + 20 classes)

ModalitymIoU
RGB52.92
Depth53.8
HHA54.19

Freiburg Forest (void + 5 classes)

ModalitymIoU
RGB83.18
Depth73.93
EVI80.96

Cityscapes (void + 19 classes)

Modality
RGB
HHA

Benchmark Results

Cityscapes

MethodBackbonemIoU_val (%)mIoU_test (%)Params (M)Time (ms)
DRNWideResNet-3879.6982.82129.161259.67
DPCModified Xception80.8582.6641.82144.41
SSMAResNet-5082.1982.3156.44101.95
DeepLabv3+Modified Xception79.5582.1443.48127.97
MapillaryWideResNet-3878.3182.03135.86214.46
Adapnet++ResNet-5081.2481.3430.2072.94
DeepLabv3ResNet-10179.3081.3458.1679.90
PSPNetResNet-10180.9181.1956.27172.42

ScanNet v2

MethodmIoU_test (%)
SSMA57.7
FuseNet52.1
Adapnet++50.3
3DMV (2d proj)49.8
ILC-PSPNet47.5

Additional Notes:

License

For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.