Home

Awesome

Object-Aware NIR-to-Visible Translation (ECCV 2024)

We provide the Pytorch implementation of "Object-Aware NIR-to-Visible Translation", ECCV 2024, by Yunyi Gao, Lin Gu, Qiankun Liu and Ying Fu.

Abstract

While near-infrared (NIR) imaging technology is essential for assisted driving and safety monitoring systems, its monochromatic nature and detail limitations hinder its broader application, which prompts the development of NIR-to-visible translation tasks. However, the performance of existing translation methods is limited by the neglected disparities between NIR and visible imaging and the lack of paired training data. To address these challenges, we propose a novel object-aware framework for NIR-to-visible translation. Our approach decomposes the visible image recovery into object-independent luminance sources and object-specific reflective components, processing them separately to bridge the gap between NIR and visible imaging under various lighting conditions. Leveraging prior segmentation knowledge enhances our model's ability to identify and understand the separated object reflection. We also collect the Fully Aligned NIR-Visible Image Dataset, a large-scale dataset comprising fully matched pairs of NIR and visible images captured with a multi-sensor coaxial camera. Empirical evaluations demonstrate the superiority of our approach over existing methods, producing visually compelling results on mainstream datasets.

Highlights

<img width="726" alt="image" src="https://github.com/Yiiclass/Sherry/assets/69071622/91c9cc8a-b93b-441d-8b8f-22b5dfaa4a62"> <img width="518" alt="image" src="https://github.com/Yiiclass/Sherry/assets/69071622/cc3e6f01-55e5-4290-84b0-7423d464ef7e">

Dataset

Download our FANVID dataset at Onedirve or Baidu Netdisk, extraction code: Cool

The datasets used for evaluation are EPFL and ICVL.

dataset
+-- 'Train'
|   +-- paired_NIR1
|   |   +-- Train_00001.bmp
|   |   +-- Train_00002.bmp
|   |   +-- Train_00003.bmp
|   |   +-- Train_00004.bmp
|   |   +-- ...
|   +-- paired_NIR2
|   |   +-- Train_00001.bmp
|   |   +-- Train_00002.bmp
|   |   +-- Train_00003.bmp
|   |   +-- Train_00004.bmp
|   |   +-- ...
|   +-- paired_RGB
|   |   +-- Train_00001.bmp
|   |   +-- Train_00002.bmp
|   |   +-- Train_00003.bmp
|   |   +-- Train_00004.bmp
|   |   +-- ...
|   +-- seg_mask2former_NIR1
|   |   +-- Train_00001.npy
|   |   +-- Train_00002.npy
|   |   +-- Train_00003.npy
|   |   +-- Train_00004.npy
|   |   +-- ...
|   +-- seg_mask2former_NIR2
|   |   +-- Train_00001.npy
|   |   +-- Train_00002.npy
|   |   +-- Train_00003.npy
|   |   +-- Train_00004.npy
|   |   +-- ...


+-- 'Test'
|   +-- paired_NIR1
|   |   +-- Test_00001.bmp
|   |   +-- Test_00002.bmp
|   |   +-- Test_00003.bmp
|   |   +-- Test_00004.bmp
|   |   +-- ...
|   +-- paired_NIR2
|   |   +-- Test_00001.bmp
|   |   +-- Test_00002.bmp
|   |   +-- Test_00003.bmp
|   |   +-- Test_00004.bmp
|   |   +-- ...
|   +-- paired_RGB
|   |   +-- Test_00001.bmp
|   |   +-- Test_00002.bmp
|   |   +-- Test_00003.bmp
|   |   +-- Test_00004.bmp
|   |   +-- ...
|   +-- seg_mask2former_NIR1
|   |   +-- Test_00001.npy
|   |   +-- Test_00002.npy
|   |   +-- Test_00003.npy
|   |   +-- Test_00004.npy
|   |   +-- ...
|   +-- seg_mask2former_NIR2
|   |   +-- Test_00001.npy
|   |   +-- Test_00002.npy
|   |   +-- Test_00003.npy
|   |   +-- Test_00004.npy
|   |   +-- ...

Usage

conda create -n Sherry python=3.7
conda activate Sherry
conda install pytorch=1.11 torchvision cudatoolkit=11.3 -c pytorch
pip install matplotlib scikit-learn scikit-image opencv-python yacs joblib natsort h5py tqdm tensorboard
pip install einops gdown addict future lmdb numpy pyyaml requests scipy yapf lpips
python setup.py develop --no_cuda_ext

Training

python3 basicsr/train.py --opt Options/NIR1.yml --gpu_id 0

Testing

python3 Enhancement/test_from_dataset.py --opt Options/NIR1.yml --gpu_id 2
--weights weight_path --dataset dataset

Results

### Quantitative comparison on the FANVID dataset

FANVID NIR1/NIR2, respectively, indicate using the 700-800nm band NIR1 or 820-1100nm band NIR2 images as input. All the methods have been retrained on both the NIR and RGB domains of our FANVID dataset, ensuring consistency in inputs and uniformity in settings.

MethodPSNR ↑SSIM ↑Delta-E ↓FID ↓PSNR ↑SSIM ↑Delta-E ↓FID ↓
FANVID NIR1FANVID NIR2
Retinexformer24.610.866.5539.7222.460.798.6451.01
CT217.410.6820.8252.7514.800.5129.4561.43
FastCUT18.650.7115.9444.2916.740.6320.0358.96
pix2pix20.100.7012.4254.3418.000.6015.9066.01
CycleGAN18.630.7116.3345.7216.350.6121.5852.02
NIRcolor15.710.5627.3847.7014.190.4632.3960.60
TLM20.650.7511.2349.7918.760.6614.4763.25
Ours25.570.875.7837.1523.370.807.6148.98

Quantitative comparison on the EPFL and ICVL datasets

All the methods have been retrained on both the NIR and RGB domains of the EPFL/ICVL datasets, ensuring consistency in inputs and uniformity in settings.

MethodPSNR ↑SSIM ↑Delta-E ↓FID ↓PSNR ↑SSIM ↑Delta-E ↓FID ↓
EPFLICVL
Retinexformer17.930.6414.89130.6727.120.897.7888.31
CT212.680.2927.03116.7317.960.7020.91134.53
FastCUT10.300.1033.77255.3918.980.6518.70169.97
pix2pix16.900.5516.17121.0224.840.819.70124.05
CycleGAN15.130.5521.87119.6419.580.6318.25169.91
NIRcolor13.990.5329.37150.6016.360.6925.52142.85
TLM15.630.4919.08193.1724.530.829.61130.51
Ours18.410.6513.85113.9027.470.907.4382.95

Citation

If you find this work useful for your research, please cite:

@inproceedings{gao2025object,
  title={Object-Aware NIR-to-Visible Translation},
  author={Gao, Yunyi and Gu, Lin and Liu, Qiankun and Fu, Ying},
  booktitle={European Conference on Computer Vision},
  pages={93--109},
  year={2025},
  organization={Springer}
}

contact

If you have any problems, please feel free to contact me at yiiclass@qq.com