Awesome
<center> Generative ConvNet Foundation Model with Sparse and Low-Frequency Filtered Masked Modeling for Remote Sensing Image Interpretation <center>
Introduction
This is the official repository for the paper "Generative ConvNet Foundation Model with Sparse and Low-Frequency Filtered Masked Modeling for Remote Sensing Image Interpretation".
Abstract: Foundation models offer a highly versatile and precise solution for intelligent interpretation of remote sensing images, thus greatly facilitating various remote sensing applications. Nevertheless, current foundational models for remote sensing predominantly employ vision transformers based on generative methods, with no corresponding exploration of ConvNets with masked image modeling (MIM). In this paper, we make the first attempt to propose a generative ConvNet foundation model tailored for remote sensing scenarios, which comprises two key components: Firstly, a large dataset named GeoSense, containing approximately nine million diverse remote sensing images, is constructed to enhance the robustness and generalization of the foundation model during the pre-training phase. Secondly, a sparse and low-frequency filtered masked modeling (SLFFM) self-supervised learning framework is designed for representation learning of ConvNet foundation model. Specifically, we introduce sub-manifold sparse convolutions to enable the ConvNet to process variable-length sequences for MIM self-supervised pre-training. Additionally, a low-frequency filtered reconstruction target is designed to guide the model's attention towards essential ground object features in remote sensing images, while mitigating unnecessary detail interference. To evaluate the general performance of our proposed foundation model, comprehensive experiments have been carried out on five datasets across three downstream tasks (i.e., object detection, semantic segmentation, and change detection.). Experimental results demonstrate that our method consistently achieves state-of-the-art performance across all benchmark datasets and downstream tasks.
Pre-trained and Fine-tuned Models
Pre-training
GeoSense
Pretrain | Backbone | Input Size | Paramters | Pretrained Model |
---|---|---|---|---|
SLFFM | ConvNeXt-Base | 224x224 | 89M | Weights |
SLFFM | ConvNeXt-Large | 224x224 | 198M | Weights |
Object Detection
Dota V1.0
Method | Pre-train | Backbone | Lr Schd | mAP | Config | Model |
---|---|---|---|---|---|---|
Oriented R-CNN | SLFFM | ConvNeXt-Base | 1x | 79.15 | Config | Weights |
Oriented R-CNN | SLFFM | ConvNeXt-Large | 1x | 79.33 | Config | Weights |
DIOR-R
Method | Pre-train | Backbone | Lr Schd | mAP | Config | Model |
---|---|---|---|---|---|---|
Oriented R-CNN | SLFFM | ConvNeXt-Base | 1x | 71.50 | Config | Weights |
Oriented R-CNN | SLFFM | ConvNeXt-Large | 1x | 72.33 | Config | Weights |
Semantic Segmentation
Potsdam
Method | Pre-train | Backbone | Lr Schd | OA | Config | Model |
---|---|---|---|---|---|---|
UperNet | SLFFM | ConvNeXt-Base | 160k | 91.72 | Config | Weights |
UperNet | SLFFM | ConvNeXt-Large | 160k | 91.82 | Config | Weights |
LoveDA
Method | Pre-train | Backbone | Lr Schd | mIoU | Config | Model |
---|---|---|---|---|---|---|
UperNet | SLFFM | ConvNeXt-Base | 160k | 52.59 | Config | Weights |
UperNet | SLFFM | ConvNeXt-Large | 160k | 53.03 | Config | Weights |
Change Detection
LEVIR-CD
Method | Pre-train | Backbone | Lr Schd | F1 | Config | Model |
---|---|---|---|---|---|---|
BIT | SLFFM | ConvNeXt-Base | 20k | 93.66 | Config | Weights |
BIT | SLFFM | ConvNeXt-Large | 20k | 93.89 | Config | Weights |
Usage
Environment
- python 3.8.13
- pytorch 1.12.1+cu113
- torchvision 0.13.1+cu113
- timm 0.6.12
- mmdet 2.28.2
- mmsegmentation 0.30.0
- opencd 0.0.3
Pre-training
torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=1234 main.py data_path=${DataPath} --exp_name=${ExpName} --exp_dir=${ExpDir} --model=${Model} --bs=1024 --init_weight=${InitWeight}
Finetune on Object Detection
Train:
bash tools/dist_train.sh ${ConfigPath} 8
Test:
bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --format-only --eval-options submission_dir=${SubmissionDir}
Finetune on Semantic Segmentation
Train:
bash tools/dist_train.sh ${ConfigPath} 8
Test:
bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --eval 'mFscore' 'mIoU'
Finetune on Change Detection
Train:
bash tools/dist_train.sh ${ConfigPath} 8
Test:
bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --eval mFscore mIoU