Awesome
Single-stream Extractor Network with Contrastive Pre-training for Remote Sensing Change Captioning
Author: Qing Zhou, Junyu Gao, Yuan Yuan, Qi Wang☨
This repository is the official implementation of SEN and also support RSICCformer, MCCFormer.
Requirements
To install requirements:
pip install -r requirements.txt
Download data form LEVIR-CC and put it in ./LEVIR_CC_dataset/
.:
- Google Drive
- Baidu Pan (code:nq9y)
Then preprocess dataset for training as follows:
python create_input_files.py --min_word_freq 5
Pre-trained models
You can download the pre-trained models from Baidu Pan, it includes the following weights:
- Trained models:
- SEN with ResNet50 pre-trained 300 epochs on SSL4EO-S12.
- MCCFormer-S/D with ResNet50 pre-trained on ImageNet.
- RSICCformer<sub>c</sub> with ResNet101 pre-trained on ImageNet.
- Pre-trained extractor on SSL4EO-S12:
- ResNet18
- ResNet34
- ResNet50
- ResNet101
Training
To train the SEN model, run this command:
CUDA_VISIBLE_DEVICES=0 python train.py \
--more_reproducibility \
--savepath model_checkpoints/SEN --model SEN \
--batch_size 128 --proj_channel 512 \
--encoder_n_layers 2 --ft_layer 4 --model_stage 4 \
--weight_path pretrain_ckpt/rn50.pth.tar
To train the RSICCformer model, run this command:
CUDA_VISIBLE_DEVICES=0 python train.py \
--more_reproducibility \
--savepath model_checkpoints/RSICCformer --model RSICCformer \
--batch_size 128 --encoder_image resnet101 \
--encoder_feat MCCFormers_diff_as_Q --decoder trans
To train the MCCFormer-S/D model, run this command:
CUDA_VISIBLE_DEVICES=0 python train.py \
--more_reproducibility \
--savepath model_checkpoints/MCCFormer-S --model RSICCformer \
--batch_size 128 --encoder_image resnet101 \
--encoder_feat MCCFormers-S --decoder trans \
--n_layer 2 --n_heads 4 --decoder_n_layers 2
Evaluation
To evaluate the SEN model, run:
python eval.py --path ./models_checkpoint/SEN/ --model SEN
To evaluate the RSICCformer, MCCFormer-S/D model, run:
python eval.py --path ./models_checkpoint/RSICCformer/ --model RSICCformer
Result
Method | B@1 | B@2 | B@3 | B@4 | M | R | C | S<sup>∗</sup><sub>𝑚</sub> | P | FPS |
---|---|---|---|---|---|---|---|---|---|---|
Capt-Rep-Diff | 72.90 | 61.98 | 53.62 | 47.41 | 34.47 | 65.64 | 110.57 | 64.52 | - | - |
Capt-Att | 77.64 | 67.40 | 59.24 | 53.15 | 36.58 | 69.73 | 121.22 | 70.17 | - | - |
Capt-Dual-Att | 79.51 | 70.57 | 63.23 | 57.46 | 36.56 | 70.69 | 124.42 | 72.28 | - | - |
DUDA | 81.44 | 72.22 | 64.24 | 57.79 | 37.15 | 71.04 | 124.32 | 72.58 | - | - |
MCCFormer-S | 79.90 | 70.26 | 62.68 | 56.68 | 36.17 | 69.46 | 120.39 | 70.68 | 69.0 | 12.9 |
MCCFormer-D | 80.42 | 70.87 | 62.86 | 56.38 | 37.29 | 70.32 | 124.44 | 72.11 | 69.0 | 12.4 |
RSICCformer_c | 83.09 | 74.32 | 66.66 | 60.44 | 38.76 | 72.63 | 130.00 | 75.46 | 56.2 | 15.0 |
PSNet | 83.86 | 75.13 | 67.89 | 62.11 | 38.80 | 73.60 | 132.62 | 76.78 | - | - |
Δ | +1.24 | +1.92 | +2.12 | +1.98 | +0.79 | +0.97 | +3.40 | +1.79 | -16.3 | +8.7 |
SEN (ours) | 85.10 | 77.05 | 70.01 | 64.09 | 39.59 | 74.57 | 136.02 | 78.57 | 39.9 | 23.7 |
Citation
@ARTICLE{10530145,
author={Zhou, Qing and Gao, Junyu and Yuan, Yuan and Wang, Qi},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Single-Stream Extractor Network With Contrastive Pre-Training for Remote-Sensing Change Captioning},
year={2024},
volume={62},
number={},
pages={1-14},
}
Reference
Thanks to the following repository: RSICCformer, SSL4EO-S12