


Open In Colab visitors

Jinyuan Liu, Xin Fan*, Zhangbo Huang, Guanyao Wu, Risheng Liu , Wei Zhong, Zhongxuan Luo,“Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection”, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. (Oral)


<h2> <p align="center"> M3FD Dataset </p> </h2>


The preview of our dataset is as follows.

preview gif



If you have any question or suggestion about the dataset, please email to Guanyao Wu or Jinyuan Liu.

<h2> <p align="center"> TarDAL Fusion </p> </h2>


In the experiment process, we used the following outstanding work as our baseline.

Note: Sorted alphabetically

Quick Start

Under normal circumstances, you may just be curious about the results of the fusion task, so we have prepared an online demonstration.

Our online preview (free) in Colab.

Set Up on Your Own Machine

When you want to dive deeper or apply it on a larger scale, you can configure our TarDAL on your computer following the steps below.

Virtual Environment

We strongly recommend that you use Conda as a package manager.

# create virtual environment
conda create -n tardal python=3.10
conda activate tardal
# select pytorch version yourself
# install tardal requirements
pip install -r requirements.txt
# install yolov5 requirements
pip install -r module/detect/requirements.txt

Data Preparation

You should put the data in the correct place in the following form.

├── data
|   ├── m3fd
|   |   ├── ir # infrared images
|   |   ├── vi # visible images
|   |   ├── labels # labels in txt format (yolo format)
|   |   └── meta # meta data, includes: pred.txt, train.txt, val.txt
|   ├── tno
|   |   ├── ir # infrared images
|   |   ├── vi # visible images
|   |   └── meta # meta data, includes: pred.txt, train.txt, val.txt
|   ├── roadscene
|   └── ...

You can directly download the TNO and RoadScene datasets organized in this format from here.

Fuse or Eval

In this section, we will guide you to generate fusion images using our pre-trained model.

As we mentioned in our paper, we provide three pre-trained models.

TarDAL-DTOptimized for human vision. (Default)
TarDAL-TTOptimized for object detection.
TarDAL-CTOptimal solution for joint human vision and detection accuracy.

You can find their corresponding configuration file path in configs.

Some settings you should pay attention to:

Under normal circumstances, you don't need to manually download the model parameters, our program will do it for you.

# use official tardal-dt infer config and save images to runs/tardal-dt
python infer.py --cfg configs/official/tardal-dt.yaml --save_dir runs/tardal-dt
# use official tardal-tt infer config and save images to runs/tardal-tt
python infer.py --cfg configs/official/tardal-tt.yaml --save_dir runs/tardal-tt
# use official tardal-ct infer config and save images to runs/tardal-ct
python infer.py --cfg configs/official/tardal-ct.yaml --save_dir runs/tardal-ct


We provide some training script for you to train your own model.

Please note: The training code is only intended to assist in understanding the paper and is not recommended for direct application in production environments.

Unlike previous code versions, you don't need to preprocess the data, we will automatically calculate the IQA weights and mask.

python train.py --cfg configs/official/tardal-dt.yaml --auth $YOUR_WANDB_KEY
python train.py --cfg configs/official/tardal-tt.yaml --auth $YOUR_WANDB_KEY
python train.py --cfg configs/official/tardal-ct.yaml --auth $YOUR_WANDB_KEY

If you want to base your approach on ours and extend it to a production environment, here are some additional suggestions for you.

Suggestion: A better train process for everyone.

Any Question

If you have any other questions about the code, please email Zhanbo Huang.

Due to job changes, the previous link zbhuang@mail.dlut.edu.cn is no longer available.


If this work has been helpful to you, please feel free to cite our paper!

  title={Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection},
  author={Liu, Jinyuan and Fan, Xin and Huang, Zhanbo and Wu, Guanyao and Liu, Risheng and Zhong, Wei and Luo, Zhongxuan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},