Home

Awesome

<div align="center">

Delivering Arbitrary-Modal Semantic Segmentation (CVPR 2023)

</div> <p align="center"> <a href="https://arxiv.org/pdf/2303.01480.pdf"> <img src="https://img.shields.io/badge/arXiv-2303.01480-red" /></a> <a href="https://jamycheung.github.io/DELIVER.html"> <img src="https://img.shields.io/badge/Project-page-green" /></a> <a href="https://www.youtube.com/watch?v=X-VeSLsEToA"> <img src="https://img.shields.io/badge/Video-YouTube-%23FF0000.svg" /></a> <a href="https://pytorch.org/"> <img src="https://img.shields.io/badge/Framework-PyTorch-orange.svg" /></a> <a href="https://github.com/jamycheung/DELIVER/blob/main/LICENSE"> <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /></a> </p>

PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC
PWC

Introduction

To conduct arbitrary-modal semantic segmentation, we create DeLiVER benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. It has four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. Besides, we present the arbitrary cross-modal segmentation model CMNeXt, allowing to scale from 1 to 81 modalities on the DeLiVER, KITTI-360, MFNet, NYU Depth V2, UrbanLF, and MCubeS datasets.

For more details, please check our arXiv paper.

Updates

DeLiVER dataset

DELIVER

DELIVER

DeLiVER multimodal dataset including (a) four adverse conditions out of five conditions(cloudy, foggy, night-time, rainy and sunny). Apart from normal cases, each condition has five corner cases (MB: Motion Blur; OE: Over-Exposure; UE: Under-Exposure; LJ: LiDAR-Jitter; and EL: Event Low-resolution). Each sample has six views. Each view has four modalities and two labels (semantic and instance). (b) is the data statistics. (c) is the data distribution of 25 semantic classes.

DELIVER splitting

DELIVER

Data folder structure

Download DELIVER dataset from GoogleDrive (~12.2 GB).

The data/DELIVER folder is structured as:

DELIVER
├── depth
│   ├── cloud
│   │   ├── test
│   │   │   ├── MAP_10_point102
│   │   │   │   ├── 045050_depth_front.png
│   │   │   │   ├── ...
│   │   ├── train
│   │   └── val
│   ├── fog
│   ├── night
│   ├── rain
│   └── sun
├── event
├── hha
├── img
├── lidar
└── semantic

CMNeXt model

CMNeXt

CMNeXt architecture in Hub2Fuse paradigm and asymmetric branches, having e.g., Multi-Head Self-Attention (MHSA) blocks in the RGB branch and our Parallel Pooling Mixer (PPX) blocks in the accompanying branch. At the hub step, the Self-Query Hub selects informative features from the supplementary modalities. At the fusion step, the feature rectification module (FRM) and feature fusion module (FFM) are used for feature fusion. Between stages, features of each modality are restored via adding the fused feature. The four-stage fused features are forwarded to the segmentation head for the final prediction.

Environment

conda env create -f environment.yml
conda activate cmnext
# Optional: install apex follow: https://github.com/NVIDIA/apex

Data preparation

Prepare six datasets:

Then, all datasets are structured as:

data/
├── DELIVER
│   ├── img
│   ├── hha
│   ├── event
│   ├── lidar
│   └── semantic
├── KITTI-360
│   ├── data_2d_raw
│   ├── data_2d_hha
│   ├── data_2d_event
│   ├── data_2d_lidar
│   └── data_2d_semantics
├── NYUDepthv2
│   ├── RGB
│   ├── HHA
│   └── Label
├── MFNet
│   ├── rgb
│   ├── ther
│   └── labels
├── UrbanLF
│   ├── Syn
│   └── real
├── MCubeS
│   ├── polL_color
│   ├── polL_aolp
│   ├── polL_dolp
│   ├── NIR_warped
│   └── SS

For RGB-Depth, the HHA format is generated from depth image.

Model Zoo

DELIVER dataset

Model-Modal#Params(M)GFLOPsmIoUweight
CMNeXt-RGB25.7938.9357.20GoogleDrive
CMNeXt-RGB-E58.6962.9457.48GoogleDrive
CMNeXt-RGB-L58.6962.9458.04GoogleDrive
CMNeXt-RGB-D58.6962.9463.58GoogleDrive
CMNeXt-RGB-D-E58.7264.1964.44GoogleDrive
CMNeXt-RGB-D-L58.7264.1965.50GoogleDrive
CMNeXt-RGB-D-E-L58.7365.4266.30GoogleDrive

KITTI360 dataset

Model-ModalmIoUweight
CMNeXt-RGB67.04GoogleDrive
CMNeXt-RGB-E66.13GoogleDrive
CMNeXt-RGB-L65.26GoogleDrive
CMNeXt-RGB-D65.09GoogleDrive
CMNeXt-RGB-D-E67.73GoogleDrive
CMNeXt-RGB-D-L66.55GoogleDrive
CMNeXt-RGB-D-E-L67.84GoogleDrive

NYU Depth V2

Model-ModalmIoUweight
CMNeXt-RGB-D (MiT-B4)56.9GoogleDrive

MFNet

Model-ModalmIoUweight
CMNeXt-RGB-D (MiT-B4)59.9GoogleDrive

UrbanLF

There are real and synthetic datasets.

Model-ModalRealweightSynweight
CMNeXt-RGB82.20GoogleDrive78.53GoogleDrive
CMNeXt-RGB-LF883.22GoogleDrive80.74GoogleDrive
CMNeXt-RGB-LF3382.62GoogleDrive80.98GoogleDrive
CMNeXt-RGB-LF8083.11GoogleDrive81.02GoogleDrive

MCubeS

Model-ModalmIoUweight
CMNeXt-RGB48.16GoogleDrive
CMNeXt-RGB-A48.42GoogleDrive
CMNeXt-RGB-A-D49.48GoogleDrive
CMNeXt-RGB-A-D-N51.54GoogleDrive

Training

Before training, please download pre-trained SegFormer, such as checkpoints/pretrained/segformer/mit_b2.pth.

checkpoints/pretrained/segformer
├── mit_b2.pth
└── mit_b4.pth

To train CMNeXt model, please use change yaml file for --cfg. Several training examples using 4 A100 GPUs are:

cd path/to/DELIVER
conda activate cmnext
export PYTHONPATH="path/to/DELIVER"
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/deliver_rgbdel.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/kitti360_rgbdel.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/nyu_rgbd.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mfnet_rgbt.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mcubes_rgbadn.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/urbanlf.yaml

Evaluation

To evaluate CMNeXt models, please download respective model weights (GoogleDrive) as:

output/
├── DELIVER
│   ├── cmnext_b2_deliver_rgb.pth
│   ├── cmnext_b2_deliver_rgbd.pth
│   ├── cmnext_b2_deliver_rgbde.pth
│   ├── cmnext_b2_deliver_rgbdel.pth
│   ├── cmnext_b2_deliver_rgbdl.pth
│   ├── cmnext_b2_deliver_rgbe.pth
│   └── cmnext_b2_deliver_rgbl.pth
├── KITTI360
│   ├── cmnext_b2_kitti360_rgb.pth
│   ├── cmnext_b2_kitti360_rgbd.pth
│   ├── cmnext_b2_kitti360_rgbde.pth
│   ├── cmnext_b2_kitti360_rgbdel.pth
│   ├── cmnext_b2_kitti360_rgbdl.pth
│   ├── cmnext_b2_kitti360_rgbe.pth
│   └── cmnext_b2_kitti360_rgbl.pth
├── MCubeS
│   ├── cmnext_b2_mcubes_rgb.pth
│   ├── cmnext_b2_mcubes_rgba.pth
│   ├── cmnext_b2_mcubes_rgbad.pth
│   └── cmnext_b2_mcubes_rgbadn.pth
├── MFNet
│   └── cmnext_b4_mfnet_rgbt.pth
├── NYU_Depth_V2
│   └── cmnext_b4_nyu_rgbd.pth
├── UrbanLF
│   ├── cmnext_b4_urbanlf_real_rgblf1.pth
│   ├── cmnext_b4_urbanlf_real_rgblf33.pth
│   ├── cmnext_b4_urbanlf_real_rgblf8.pth
│   ├── cmnext_b4_urbanlf_real_rgblf80.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf1.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf33.pth
│   ├── cmnext_b4_urbanlf_syn_rgblf8.pth
│   └── cmnext_b4_urbanlf_syn_rgblf80.pth

Then, modify --cfg to respective config file, and run:

cd path/to/DELIVER
conda activate cmnext
export PYTHONPATH="path/to/DELIVER"
CUDA_VISIBLE_DEVICES=0 python tools/val_mm.py --cfg configs/deliver_rgbdel.yaml

On DeLiVER dataset, there are validation and test sets. Please check val_mm.py to modify the dataset for validation and test sets.

To evaluate the different cases (adverse weather conditions, sensor failures), modify the cases list at val_mm.py, as shown below:

# cases = ['cloud', 'fog', 'night', 'rain', 'sun']
# cases = ['motionblur', 'overexposure', 'underexposure', 'lidarjitter', 'eventlowres']
cases = [None] # all

Note that the default value is [None] for all cases.

DELIVER visualization

<img src="figs/DELIVER_vis.png" width="500px">

The visualization results on DELIVER dataset. From left to right are the respective cloudy, foggy, night and rainy scene.

Acknowledgements

Thanks for the public repositories:

License

This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.

Citations

If you use DeLiVer dataset and CMNeXt model, please cite the following works:

@inproceedings{zhang2023delivering,
  title={Delivering Arbitrary-Modal Semantic Segmentation},
  author={Zhang, Jiaming and Liu, Ruiping and Shi, Hao and Yang, Kailun and Rei{\ss}, Simon and Peng, Kunyu and Fu, Haodong and Wang, Kaiwei and Stiefelhagen, Rainer},
  booktitle={CVPR},
  year={2023}
}
@article{zhang2023cmx,
  title={CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers},
  author={Zhang, Jiaming and Liu, Huayao and Yang, Kailun and Hu, Xinxin and Liu, Ruiping and Stiefelhagen, Rainer},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2023}
}