Home

Awesome

RGBX_Semantic_Segmentation

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Example segmentation

The official implementation of CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers (IEEE T-ITS 2023): More details can be found in our paper [PDF].

Usage

Installation

  1. Requirements

We have tested the following versions of OS and softwares:

  1. Install all dependencies. Install pytorch, cuda and cudnn, then install other dependencies via:
pip install -r requirements.txt

Datasets

Orgnize the dataset folder in the following structure:

<datasets>
|-- <DatasetName1>
    |-- <RGBFolder>
        |-- <name1>.<ImageFormat>
        |-- <name2>.<ImageFormat>
        ...
    |-- <ModalXFolder>
        |-- <name1>.<ModalXFormat>
        |-- <name2>.<ModalXFormat>
        ...
    |-- <LabelFolder>
        |-- <name1>.<LabelFormat>
        |-- <name2>.<LabelFormat>
        ...
    |-- train.txt
    |-- test.txt
|-- <DatasetName2>
|-- ...

train.txt contains the names of items in training set, e.g.:

<name1>
<name2>
...

For RGB-Depth semantic segmentation, the generation of HHA maps from Depth maps can refer to https://github.com/charlesCXK/Depth2HHA-python.

For preparation of other datasets, please refer to the original websites:

Train

  1. Pretrain weights:

    Download the pretrained segformer here pretrained segformer.

  2. Config

    Edit config file in configs.py, including dataset and network settings.

  3. Run multi GPU distributed training:

    $ CUDA_VISIBLE_DEVICES="GPU IDs" python -m torch.distributed.launch --nproc_per_node="GPU numbers you want to use" train.py
    

Evaluation

Run the evaluation by:

CUDA_VISIBLE_DEVICES="GPU IDs" python eval.py -d="Device ID" -e="epoch number or range"

If you want to use multi GPUs please specify multiple Device IDs (0,1,2...).

Result

We offer the pre-trained weights on different RGBX datasets (Some weights are not available yet. Due to the difference of training platforms, these weights may not be correctly loaded):

NYU-V2(40 categories)

ArchitectureBackbonemIOU(SS)mIOU(MS & Flip)Weight
CMX (SegFormer)MiT-B254.1%54.4%NYU-MiT-B2
CMX (SegFormer)MiT-B456.0%56.3%
CMX (SegFormer)MiT-B556.8%56.9%

MFNet(9 categories)

ArchitectureBackbonemIOUWeight
CMX (SegFormer)MiT-B258.2%MFNet-MiT-B2
CMX (SegFormer)MiT-B459.7%

ScanNet-V2(20 categories)

ArchitectureBackbonemIOUWeight
CMX (SegFormer)MiT-B261.3%ScanNet-MiT-B2

RGB-Event(20 categories)

ArchitectureBackbonemIOUWeight
CMX (SegFormer)MiT-B464.28%RGBE-MiT-B4

Publication

If you find this repo useful, please consider referencing the following paper:

@article{zhang2023cmx,
  title={CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers},
  author={Zhang, Jiaming and Liu, Huayao and Yang, Kailun and Hu, Xinxin and Liu, Ruiping and Stiefelhagen, Rainer},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2023}
}

Acknowledgement

Our code is heavily based on TorchSeg and SA-Gate, thanks for their excellent work!