Awesome

Delivering Arbitrary-Modal Semantic Segmentation (CVPR 2023)

</div> <p align="center"> <a href="https://arxiv.org/pdf/2303.01480.pdf"> <img src="https://img.shields.io/badge/arXiv-2303.01480-red" /></a> <a href="https://jamycheung.github.io/DELIVER.html"> <img src="https://img.shields.io/badge/Project-page-green" /></a> <a href="https://www.youtube.com/watch?v=X-VeSLsEToA"> <img src="https://img.shields.io/badge/Video-YouTube-%23FF0000.svg" /></a> <a href="https://pytorch.org/"> <img src="https://img.shields.io/badge/Framework-PyTorch-orange.svg" /></a> <a href="https://github.com/jamycheung/DELIVER/blob/main/LICENSE"> <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /></a> </p>

Introduction

To conduct arbitrary-modal semantic segmentation, we create DeLiVER benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. It has four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. Besides, we present the arbitrary cross-modal segmentation model CMNeXt, allowing to scale from 1 to 81 modalities on the DeLiVER, KITTI-360, MFNet, NYU Depth V2, UrbanLF, and MCubeS datasets.

For more details, please check our arXiv paper.

Updates

03/2023, init repository.
04/2023, release front-view DeLiVER. Download from GoogleDrive.
04/2023, release CMNeXt model weights. Download from GoogleDrive.

DeLiVER dataset

DELIVER

DeLiVER multimodal dataset including (a) four adverse conditions out of five conditions(cloudy, foggy, night-time, rainy and sunny). Apart from normal cases, each condition has five corner cases (MB: Motion Blur; OE: Over-Exposure; UE: Under-Exposure; LJ: LiDAR-Jitter; and EL: Event Low-resolution). Each sample has six views. Each view has four modalities and two labels (semantic and instance). (b) is the data statistics. (c) is the data distribution of 25 semantic classes.

DELIVER splitting

DELIVER

Data folder structure

Download DELIVER dataset from GoogleDrive (~12.2 GB).

The data/DELIVER folder is structured as:

DELIVER
├── depth
│   ├── cloud
│   │   ├── test
│   │   │   ├── MAP_10_point102
│   │   │   │   ├── 045050_depth_front.png
│   │   │   │   ├── ...
│   │   ├── train
│   │   └── val
│   ├── fog
│   ├── night
│   ├── rain
│   └── sun
├── event
├── hha
├── img
├── lidar
└── semantic

CMNeXt model

CMNeXt

CMNeXt architecture in Hub2Fuse paradigm and asymmetric branches, having e.g., Multi-Head Self-Attention (MHSA) blocks in the RGB branch and our Parallel Pooling Mixer (PPX) blocks in the accompanying branch. At the hub step, the Self-Query Hub selects informative features from the supplementary modalities. At the fusion step, the feature rectification module (FRM) and feature fusion module (FFM) are used for feature fusion. Between stages, features of each modality are restored via adding the fused feature. The four-stage fused features are forwarded to the segmentation head for the final prediction.

Environment

conda env create -f environment.yml
conda activate cmnext
# Optional: install apex follow: https://github.com/NVIDIA/apex

Data preparation

Prepare six datasets:

DELIVER, for RGB-Depth-Event-LiDAR semantic segmentation.
KITTI-360, for RGB-Depth-Event-LiDAR semantic segmentation.
NYU Depth V2, for RGB-Depth semantic segmentation.
MFNet, for RGB-Thermal semantic segmentation.
UrbanLF, for light-filed segmentation based on sub-aperture images.
MCubeS, for multimodal material segmentation with RGB-A-D-N modalities.

Then, all datasets are structured as:

data/
├── DELIVER
│   ├── img
│   ├── hha
│   ├── event
│   ├── lidar
│   └── semantic
├── KITTI-360
│   ├── data_2d_raw
│   ├── data_2d_hha
│   ├── data_2d_event
│   ├── data_2d_lidar
│   └── data_2d_semantics
├── NYUDepthv2
│   ├── RGB
│   ├── HHA
│   └── Label
├── MFNet
│   ├── rgb
│   ├── ther
│   └── labels
├── UrbanLF
│   ├── Syn
│   └── real
├── MCubeS
│   ├── polL_color
│   ├── polL_aolp
│   ├── polL_dolp
│   ├── NIR_warped
│   └── SS

For RGB-Depth, the HHA format is generated from depth image.

Model Zoo

DELIVER dataset

Model-Modal	#Params(M)	GFLOPs	mIoU	weight
CMNeXt-RGB	25.79	38.93	57.20	GoogleDrive
CMNeXt-RGB-E	58.69	62.94	57.48	GoogleDrive
CMNeXt-RGB-L	58.69	62.94	58.04	GoogleDrive
CMNeXt-RGB-D	58.69	62.94	63.58	GoogleDrive
CMNeXt-RGB-D-E	58.72	64.19	64.44	GoogleDrive
CMNeXt-RGB-D-L	58.72	64.19	65.50	GoogleDrive
CMNeXt-RGB-D-E-L	58.73	65.42	66.30	GoogleDrive

KITTI360 dataset

Model-Modal	mIoU	weight
CMNeXt-RGB	67.04	GoogleDrive
CMNeXt-RGB-E	66.13	GoogleDrive
CMNeXt-RGB-L	65.26	GoogleDrive
CMNeXt-RGB-D	65.09	GoogleDrive
CMNeXt-RGB-D-E	67.73	GoogleDrive
CMNeXt-RGB-D-L	66.55	GoogleDrive
CMNeXt-RGB-D-E-L	67.84	GoogleDrive

NYU Depth V2

Model-Modal	mIoU	weight
CMNeXt-RGB-D (MiT-B4)	56.9	GoogleDrive

MFNet

Model-Modal	mIoU	weight
CMNeXt-RGB-D (MiT-B4)	59.9	GoogleDrive

UrbanLF

There are real and synthetic datasets.

Model-Modal	Real	weight	Syn	weight
CMNeXt-RGB	82.20	GoogleDrive	78.53	GoogleDrive
CMNeXt-RGB-LF8	83.22	GoogleDrive	80.74	GoogleDrive
CMNeXt-RGB-LF33	82.62	GoogleDrive	80.98	GoogleDrive
CMNeXt-RGB-LF80	83.11	GoogleDrive	81.02	GoogleDrive

MCubeS

Model-Modal	mIoU	weight
CMNeXt-RGB	48.16	GoogleDrive
CMNeXt-RGB-A	48.42	GoogleDrive
CMNeXt-RGB-A-D	49.48	GoogleDrive
CMNeXt-RGB-A-D-N	51.54	GoogleDrive

Training

Before training, please download pre-trained SegFormer, such as checkpoints/pretrained/segformer/mit_b2.pth.

checkpoints/pretrained/segformer
├── mit_b2.pth
└── mit_b4.pth

To train CMNeXt model, please use change yaml file for --cfg. Several training examples using 4 A100 GPUs are:

cd path/to/DELIVER
conda activate cmnext
export PYTHONPATH="path/to/DELIVER"
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/deliver_rgbdel.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/kitti360_rgbdel.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/nyu_rgbd.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mfnet_rgbt.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mcubes_rgbadn.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/urbanlf.yaml

Evaluation

To evaluate CMNeXt models, please download respective model weights (GoogleDrive) as:

output/
├── DELIVER
│   ├── cmnext_b2_deliver_rgb.pth
│   ├── cmnext_b2_deliver_rgbd.pth
│   ├── cmnext_b2_deliver_rgbde.pth
│   ├── cmnext_b2_deliver_rgbdel.pth
│   ├── cmnext_b2_deliver_rgbdl.pth
│   ├── cmnext_b2_deliver_rgbe.pth
│   └── cmnext_b2_deliver_rgbl.pth
├── KITTI360
│   ├── cmnext_b2_kitti360_rgb.pth
│   ├── cmnext_b2_kitti360_rgbd.pth
│   ├── cmnext_b2_kitti360_rgbde.pth
│   ├── cmnext_b2_kitti360_rgbdel.pth
│   ├── cmnext_b2_kitti360_rgbdl.pth
│   ├── cmnext_b2_kitti360_rgbe.pth
│   └── cmnext_b2_kitti360_rgbl.pth
├── MCubeS
│   ├── cmnext_b2_mcubes_rgb.pth
│   ├── cmnext_b2_mcubes_rgba.pth
│   ├── cmnext_b2_mcubes_rgbad.pth
│   └── cmnext_b2_mcubes_rgbadn.pth
├── MFNet
│   └── cmnext_b4_mfnet_rgbt.pth
├── NYU_Depth_V2
│   └── cmnext_b4_nyu_rgbd.pth
├── UrbanLF
│   ├── cmnext_b4_urbanlf_real_rgblf1.pth
│   ├── cmnext_b4_urbanlf_real_rgblf33.pth
│   ├── cmnext_b4_urbanlf_real_rgblf8.pth
│   ├── cmnext_b4_urbanlf_real_rgblf80.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf1.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf33.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf8.pth
│   └── cmnext_b4_urbanlf_syn_rgblf80.pth

Then, modify --cfg to respective config file, and run:

cd path/to/DELIVER
conda activate cmnext
export PYTHONPATH="path/to/DELIVER"
CUDA_VISIBLE_DEVICES=0 python tools/val_mm.py --cfg configs/deliver_rgbdel.yaml

On DeLiVER dataset, there are validation and test sets. Please check val_mm.py to modify the dataset for validation and test sets.

To evaluate the different cases (adverse weather conditions, sensor failures), modify the cases list at val_mm.py, as shown below:

# cases = ['cloud', 'fog', 'night', 'rain', 'sun']
# cases = ['motionblur', 'overexposure', 'underexposure', 'lidarjitter', 'eventlowres']
cases = [None] # all

Note that the default value is [None] for all cases.

DELIVER visualization

The visualization results on DELIVER dataset. From left to right are the respective cloudy, foggy, night and rainy scene.

Acknowledgements

Thanks for the public repositories:

License

This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.

Citations

If you use DeLiVer dataset and CMNeXt model, please cite the following works:

DeLiVER & CMNeXt [PDF]

@inproceedings{zhang2023delivering,
  title={Delivering Arbitrary-Modal Semantic Segmentation},
  author={Zhang, Jiaming and Liu, Ruiping and Shi, Hao and Yang, Kailun and Rei{\ss}, Simon and Peng, Kunyu and Fu, Haodong and Wang, Kaiwei and Stiefelhagen, Rainer},
  booktitle={CVPR},
  year={2023}
}

CMX [PDF]

@article{zhang2023cmx,
  title={CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers},
  author={Zhang, Jiaming and Liu, Huayao and Yang, Kailun and Hu, Xinxin and Liu, Ruiping and Stiefelhagen, Rainer},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2023}
}