Home

Awesome

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Auhtor: FANG Qingyun and WANG Zhaokui

Intro

CMAFF:Cross-Modality Attentive Feature Fusion

<div align="left"> <img src="https://github.com/DocF/CMAFF/blob/main/yolofusion.png" width="700"> </div>

Differential Enhancive Module

<div align="left"> <img src="https://github.com/DocF/CMAFF/blob/main/DMEA.png" width="600"> </div>

Common Selective Module

<div align="left"> <img src="https://github.com/DocF/CMAFF/blob/main/CMSA.png" width="600"> </div>

Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms, making them more robust and reliable for a wider range of applications, such as nighttime detection. Compared with prior methods, we think different features should be processed specifically, the modality-specific features should be retained and enhanced, while the modality-shared features should be cherry- picked from the RGB and thermal IR modalities. Following this idea, a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions are proposed, named Cross-Modality Attentive Feature Fusion (CMAFF). Given the intermediate feature maps of RGB and IR images, our module parallel infers attention maps from two separate modalities, common- and differential-modality, then the attention maps are multiplied to the input feature map respectively for adaptive feature enhancement or selection. Extensive experiments demonstrate that our proposed approach can achieve the state-of-the-art performance at a low computation cost. For more details, please refer to our paper.

Citation

If you are interested this repo for your research, welcome to cite our paper:

@article{qingyun2022cross,
  title={Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery},
  author={Qingyun, Fang and Zhaokui, Wang},
  journal={Pattern Recognition},
  pages={108786},
  year={2022},
  publisher={Elsevier}
}

Result

ModelAttentionParams(M)FLOPs(M)MemR+W(MB)
Yolov5lMCFF_10.030.060.13
MCFF_20.060.130.26
MCFF_30.160.501.02
Average0.080.230.47
Yolov5lGFU_12.3830400103.25
GFU_29.503040084.88
GFU_338.0030400175.44
Average16.6330400121.19
Yolov5lCMAFF_10.040.080.16
CMAFF_20.080.160.33
CMAFF_30.310.621.28
Average0.140.290.59
Yolov5sMCFF_10.020.030.07
MCFF_20.030.060.13
MCFF_30.060.130.26
Average0.040.070.15
Yolov5sGFU_10.59760049.25
GFU_29.50760032.94
GFU_338.00760049.72
Average4.16760043.97
Yolov5sCMAFF_10.020.040.08
CMAFF_20.040.080.16
CMAFF_30.080.160.33
Average0.050.090.19