Awesome

Open-Vocabulary RGB-Thermal Semantic Segmentation

This is the official repository for the OpenRSS (ECCV 2024).

Abstract: RGB-Thermal (RGB-T) semantic segmentation is an important research branch of multi-modal image segmentation. The current RGB-T semantic segmentation methods generally have two unsolved and typical shortcomings. First, they do not have the open-vocabulary recognition ability, which significantly limits their application scenarios. Second, when fusing RGB and thermal images, they often need to design complex fusion network structures, which usually results in low network training efficiency. We present OpenRSS, the Open-vocabulary RGB-T Semantic Segmentation method, to solve these two disadvantages. To our knowledge, OpenRSS is the first RGB-T semantic segmentation method with open-vocabulary segmentation capability. OpenRSS modifies the basic segmentation model SAM for RGB-T semantic segmentation by adding the proposed thermal information prompt module and dynamic low-rank adaptation strategy to SAM. These designs effectively fuse the RGB and thermal information, but with much fewer trainable parameters than other methods. OpenRSS achieves the open-vocabulary capability by jointly utilizing the vision-language model CLIP and the modified SAM. Through extensive experiments, OpenRSS demonstrates its effective open-vocabulary semantic segmentation ability on RGB-T images. It outperforms other state-of-the-art RGB open-vocabulary semantic segmentation methods on multiple RGB-T semantic segmentation benchmarks: +12.1% mIoU on the MFNet dataset, +18.4% mIoU on the MCubeS dataset, and +21.4% mIoU on the Freiburg Thermal dataset.

openrss

Installation

Requirements

PyTorch 1.11.0
Python 3.8(ubuntu20.04)
Cuda 11.3

Install required packages

pip install -r requirements.txt

Dataset

Download datasets and place them in 'datasets' folder in the following structure:

<datasets>
|-- <MFdataset>
    |-- <images>
    |-- <labels>
    |-- train.txt
    |-- val.txt
    |-- test.txt
    ...
|-- <MCUbeSdataset>
    |-- <list_folder>
    |-- <NIR_warped>
    |-- <poIL_color>
    |-- <SSGT4MS>
    ...
|-- <KPdataset>
    |-- <images>
        |-- set00
        |-- set01
        ...
    |-- <labels>
    |-- train.txt
    |-- val.txt
    |-- test.txt
    ...
|-- <FTdataset>
    |-- <test>
        |-- <label>
        |-- <rgb>
        |-- <th>
    ...

Training

python train_KP.py -dr [data_dir] -ls 0.0005 -b 4 -em 300

Evaluation

python own_MF.py -dr [data_dir] -d [test] -f best.pth
python own_FT.py -dr [data_dir] -d [test] -f best.pth
python own_MCubeS.py -dr [data_dir] -d [test] -f best.pth

RESULTS

result

Checkpoints

The pre-trained checkpoints of OpenRSS can be downloaded from

https://drive.google.com/drive/folders/10BF2LhcQyjzNGfd5CNOGHlBs_WX8Zg4Y

Citation

If you find OpenRSS is helpful for your research, please consider giving it a star and citing it:

@inproceedings{OpenRSS,
  title={Open-Vocabulary RGB-Thermal Semantic Segmentation},
  author={Zhao, Guoqiang and Huang, Junjie and Yan, Xiaoyun and Wang, Zhaojing and Tang, Junwei and Ou, Yangjun and Hu, Xinrong and Peng, Tao},
  booktitle={Computer Vision -- ECCV 2024},
  year={2025},
  pages={304--320}
}