Awesome

<div align="center"> <h2>A Change Detection Reality Check</h2>

Isaac Corley1 · Caleb Robinson2 · Anthony Ortiz2

1University of Texas at San Antonio 2Microsoft AI for Good Research Lab

</div>

Code and experiments for the paper, "A Change Detection Reality Check", Isaac Corley, Caleb Robinson, Anthony Ortiz presented at the ICLR 2024 Machine Learning for Remote Sensing (ML4RS) Workshop

Summary

Remote sensing image literature from the past several years has exploded with proposed deep learning architectures that claim to be the latest state-of-the-art on standard change detection benchmark datasets. However, has the field truly made significant progress? In this paper we perform experiments which conclude a simple U-Net segmentation baseline without training tricks or complicated architectural changes is still a top performer for the task of change detection.

Results

We find that U-Net is still a top performer on the LEVIR-CD and WHU-CD benchmark datasets. See below tables for comparisons with SOTA methods.

<img src="./assets/levircd-results.png" width="500"/> Table 1. Comparison of state-of-the-art and change detection architectures to a U-Net baseline on the LEVIR-CD dataset. We report the test set precision, recall, and F1 metrics of the positive change class. For the baseline experiments we perform 10 runs while varying random the seed and report metrics from the highest performing run. All other metrics are taken from their respective papers. The top performing methods are highlighted in bold. Gray rows indicate our baseline U-Net and siamese encoder variants. <img src="./assets/whucd-results.png" width="500"/> Table 2. Experimental results on the WHU-CD dataset. We retrain several state-of-the-art methods using the original dataset’s train/test splits instead of the commonly used randomly split preprocessed version created in (Bandara & Patel (2022a)). We find that these state-of-the-art methods are outperformed by a U-Net baseline. We report the test set precision, recall, F1, and IoU metrics of the positive change class. For each run we select the model checkpoint with the lowest validation set loss. We provide metrics averaged over 10 runs with varying random seed as well as the best seed. Gray rows indicate our baseline U-Net and siamese encoder variants.

Model Checkpoints

**Model Checkpoints uploaded to HuggingFace here!

LEVIR-CD

Model	Backbone	Precision	Recall	F1	IoU	Checkpoint
U-Net	ResNet-50	0.9197	0.8795	0.8991	0.8167	Checkpoint
U-Net	EfficientNet-B4	0.9269	0.8588	0.8915	0.8044	Checkpoint
U-Net SiamConc	ResNet-50	0.9287	0.8749	0.9010	0.8199	Checkpoint
U-Net SiamDiff	ResNet-50	0.9321	0.8730	0.9015	0.8207	Checkpoint

WHU-CD (using official train/test splits)

Model	Backbone	Precision	Recall	F1	IoU	Checkpoint
U-Net SiamConc	ResNet-50	0.8369	0.8130	0.8217	0.7054	Checkpoint
U-Net SiamDiff	ResNet-50	0.8856	0.7741	0.8248	0.7086	Checkpoint
U-Net	ResNet-50	0.8865	0.7663	0.8200	0.7020	Checkpoint

Reproducing Results

Download the LEVIR-CD and WHU-CD datasets and then use the following notebooks to chip the datasets into non-overlapping 256x256 patches.

scripts/preprocess_levircd.ipynb
scripts/preprocess_whucd.ipynb

To train UNet on both datasets over 10 random seeds run

python train_levircd.py --train-root /path/to/preprocessed-dataset/ --model unet --backbone resnet50 --num_seeds 10
python train_whucd.py --train-root /path/to/preprocessed-dataset/ --model unet --backbone resnet50 --num_seeds 10

To evaluate a set of checkpoints and save results to a .csv file run:

python test_levircd.py --root /path/to/preprocessed-dataset/ --ckpt-root lightning_logs/ --output-filename metrics.csv
python test_whucd.py --root /path/to/preprocessed-dataset/ --ckpt-root lightning_logs/ --output-filename metrics.csv

Citation

If this work inspired your change detection research, please consider citing our paper:

@article{corley2024change,
  title={A Change Detection Reality Check},
  author={Corley, Isaac and Robinson, Caleb and Ortiz, Anthony},
  journal={arXiv preprint arXiv:2402.06994},
  year={2024}
}