Awesome
PyTorch implementation of "LMAFormer: Local Motion Aware Transformer for Small Moving Infrared Target Detection"
<hr>Abstract
<p align="justify"> In temporal infrared small target detection, it is crucial to leverage the disparities in spatiotemporal characteristics between the target and the background to distinguish the former. However, remote imaging and the relative motion between the detection platform and the background cause significant coupling of spatiotemporal characteristics, making target detection highly challenging. To address these challenges, we propose a network named LMAFormer. First, we introduce a local motion-aware spatiotemporal attention mechanism that aligns and enhances multiframe features to extract local spatiotemporal salient features of targets while avoiding interference from moving backgrounds. Second, we employ a multiscale fusion transformer encoder that computes self-attention weights across and within scales during encoding, to establish multiscale correlations among different regions of temporal images, enabling motion background modeling. Last, we propose a multiframe joint query decoder. The shallowest feature map after multiscale feature propagation is mapped to initial query weights, which are refined through grouped convolutions to generate grouped query vectors. These are jointly optimized to encapsulate rich multiframe details, strengthening motion background modeling and target feature representation, improving prediction accuracy. Experimental results on the NUDT-MIRSDT, IRDST, and the established TSIRMT datasets demonstrate that our network outperforms state-of-the-art (SOTA) methods. </p>Architecture
<p align="center"> <img src="pic/LMAFormer_fig_1.png" width="auto" alt="accessibility text"> </p> Overall Architecture of LMAFormer.Installation
Environment Setup
The experiments were done on Windows11 with python 3 using anaconda environment. Here is details on how to set up the conda environment. (If you do not have anaconda 3 installed, first do it following the set up instruction from here)
-
Create conda environment:
conda create -n LMAFormer python=3 conda activate LMAFormer
-
Install PyTorch from here.
-
Install MultiScaleDeformableAttention module:
python ./MFIRSTD/models/ops/setup.py install
-
Install other requirements:
pip install -r requirements.txt
Datasets
We evaluate network performance using NUDR-MIRSDT, IRDST and a self-built dataset TSIRMT
Download the datasets following the corresponding paper/project page and update dataset paths in 'datasets/path_config.py'. Here is the list of datasets used.
- NUDT-MIRSDT (Extraction code: 5whn)
- IRDST
- TSIRMT
The NUDT-MIRSDT dataset requires the 'rearrange_dataset.py' program for path reconstruction, with 'rearrange_dataset.py' located at '/Datasets/rearrange_dataset.py'. The dataset division is subject to the method in the Datasets folder.
Download Trained Models
Pretrained Swin backbones can be downloaded from their corresponding repository.
If you are interested in evaluating only, you can download the selected trained LMAFormer checkpoints from the links in the results table.
Training
The models were trained and tested using a single NVIDIA 4080 GPU.
- Train LMAFormer with Swin backbone on NUDT-MIRSDR, IRDST, TSIRMT datasets:
python train_swin.py
Inference
Inference on NUDT-MIRSDT:
```
python inference_swin.py --model_path ./result/TSIRMT/checkpoint_NUDT-MIRSDT.pth --dataset NUDT-MIRSDT --val_size 400 --flip --msc --output_dir ./predict/NUDT-MIRSDT
```
Expected miou: 73.26
Inference on IRDST:
```
python inference_swin.py --model_path ./result/IRDST/checkpoint_IRDST.pth --dataset IRDST --val_size 400 --flip --msc --output_dir ./predict/IRDST
```
Expected miou: 59.17
Inference on TSIRMT:
```
python inference_swin.py --model_path ./result/TSIRMT/checkpoint_TSIRMT.pth --dataset TSIRMT --val_size 400 --flip --msc --output_dir ./predict/TSIRMT
```
Expected miou: 65.89
Results Summary
Results on NUDT-MIRSDT, IRDST and TSIRMT
Dataset | Checkpoint | IoU | nIoU | Pd | Fa |
---|---|---|---|---|---|
NUDT-MIRSDT | checkpoint | 73.26 | 73.63 | 99.68 | 0.71 |
IRDST | checkpoint | 59.17 | 57.51 | 99.64 | 14.95 |
TSIRMT | checkpoint | 65.89 | 65.63 | 86.10 | 185.78 |
Acknowledgement
We would like to thank the open-source projects with special thanks to DETR and VisTR for making their code public. Part of the code in our project are collected and modified from several open source repositories.
Citation
Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.
@ARTICLE{10758760,
author={Huang, Yuanxin and Zhi, Xiyang and Hu, Jianming and Yu, Lijian and Han, Qichao and Chen, Wenbin and Zhang, Wei},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={LMAFormer: Local Motion Aware Transformer for Small Moving Infrared Target Detection},
year={2024},
volume={62},
number={},
pages={1-17},
keywords={Feature extraction;Object detection;Transformers;Decoding;Three-dimensional displays;Computational modeling;Deep learning;Annotations;Visualization;Urban areas;Infrared small moving target detection;local motion aware;multiframe joint query;multiscale transformer encoder},
doi={10.1109/TGRS.2024.3502663}}