Home

Awesome

DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation

This project contains the Pytorch implementation for the proposed DABNet: [arXiv].

Introduction

<p align="center"><img width="100%" src="./image/architecture.png" /></p>

As a pixel-level prediction task, semantic segmentation needs large computational cost with enormous parameters to obtain high performance. Recently, due to the increasing demand for autonomous systems and robots, it is significant to make a tradeoff between accuracy and inference speed. In this paper, we propose a novel Depthwise Asymmetric Bottleneck (DAB) module to address this dilemma, which efficiently adopts depth-wise asymmetric convolution and dilated convolution to build a bottleneck structure. Based on the DAB module, we design a Depth-wise Asymmetric Bottleneck Network (DABNet) especially for real-time semantic segmentation, which creates sufficient receptive field and densely utilizes the contextual information. Experiments on Cityscapes and CamVid datasets demonstrate that the proposed DABNet achieves a balance between speed and precision. Specifically, without any pretrained model and postprocessing, it achieves 70.1% Mean IoU on the Cityscapes test dataset with only 0.76 million parameters and a speed of 104 FPS on a single GTX 1080Ti card.

Installation

pip install opencv-python pillow numpy matplotlib 
git clone https://github.com/Reagan1311/DABNet
cd DABNet

Dataset

You need to download the two dataset——CamVid and Cityscapes, and put the files in the dataset folder with following structure.

├── camvid
|    ├── train
|    ├── test
|    ├── val 
|    ├── trainannot
|    ├── testannot
|    ├── valannot
|    ├── camvid_trainval_list.txt
|    ├── camvid_train_list.txt
|    ├── camvid_test_list.txt
|    └── camvid_val_list.txt
├── cityscapes
|    ├── gtCoarse
|    ├── gtFine
|    ├── leftImg8bit
|    ├── cityscapes_trainval_list.txt
|    ├── cityscapes_train_list.txt
|    ├── cityscapes_test_list.txt
|    └── cityscapes_val_list.txt           

Training

python train.py --dataset ${camvid, cityscapes} --train_type ${train, trainval} --max_epochs ${EPOCHS} --batch_size ${BATCH_SIZE} --lr ${LR} --resume ${CHECKPOINT_FILE}
python train.py --dataset cityscapes
python train.py --dataset camvid --train_type trainval --max_epochs 1000 --lr 1e-3 --batch_size 16
Val mIoU vs EpochsTrain loss vs Epochs
alt text-1alt text-2

(PS: Based on the graphs, we think that training is not saturated yet, maybe the LR is too large, so you can change the hyper-parameter to get better result)

Testing

python test.py --dataset ${camvid, cityscapes} --checkpoint ${CHECKPOINT_FILE}

Evaluation

python predict.py --checkpoint ${CHECKPOINT_FILE}

Inference Speed

python eval_fps.py 512,1024

Results

DatasetPretrainedTrain typemIoUFPSmodel
Cityscapes(Fine)from scratchtrainval70.07​%104Detailed result
Cityscapes(Fine)from scratchtrain69.57​%104GoogleDrive
CamVidfrom scratchtrainval66.72​%146GoogleDrive
<p align="center"><img width="100%" src="./image/DABNet_demo.png" /></p>

Citation

Please consider citing the DABNet if it's helpful for your research.

@inproceedings{li2019dabnet,
  title={DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation},
  author={Li, Gen and Kim, Joongkyu},
  booktitle={British Machine Vision Conference},
  year={2019}
}

Thanks to the Third Party Libs

Pytorch
Pytorch-Deeplab
ERFNet
CGNet