Home

Awesome

<center> Generative ConvNet Foundation Model with Sparse and Low-Frequency Filtered Masked Modeling for Remote Sensing Image Interpretation <center>

Introduction

This is the official repository for the paper "Generative ConvNet Foundation Model with Sparse and Low-Frequency Filtered Masked Modeling for Remote Sensing Image Interpretation".

Abstract: Foundation models offer a highly versatile and precise solution for intelligent interpretation of remote sensing images, thus greatly facilitating various remote sensing applications. Nevertheless, current foundational models for remote sensing predominantly employ vision transformers based on generative methods, with no corresponding exploration of ConvNets with masked image modeling (MIM). In this paper, we make the first attempt to propose a generative ConvNet foundation model tailored for remote sensing scenarios, which comprises two key components: Firstly, a large dataset named GeoSense, containing approximately nine million diverse remote sensing images, is constructed to enhance the robustness and generalization of the foundation model during the pre-training phase. Secondly, a sparse and low-frequency filtered masked modeling (SLFFM) self-supervised learning framework is designed for representation learning of ConvNet foundation model. Specifically, we introduce sub-manifold sparse convolutions to enable the ConvNet to process variable-length sequences for MIM self-supervised pre-training. Additionally, a low-frequency filtered reconstruction target is designed to guide the model's attention towards essential ground object features in remote sensing images, while mitigating unnecessary detail interference. To evaluate the general performance of our proposed foundation model, comprehensive experiments have been carried out on five datasets across three downstream tasks (i.e., object detection, semantic segmentation, and change detection.). Experimental results demonstrate that our method consistently achieves state-of-the-art performance across all benchmark datasets and downstream tasks.

flowchart

Pre-trained and Fine-tuned Models

Pre-training

GeoSense

PretrainBackboneInput SizeParamtersPretrained Model
SLFFMConvNeXt-Base224x22489MWeights
SLFFMConvNeXt-Large224x224198MWeights

Object Detection

Dota V1.0

MethodPre-trainBackboneLr SchdmAPConfigModel
Oriented R-CNNSLFFMConvNeXt-Base1x79.15ConfigWeights
Oriented R-CNNSLFFMConvNeXt-Large1x79.33ConfigWeights

DIOR-R

MethodPre-trainBackboneLr SchdmAPConfigModel
Oriented R-CNNSLFFMConvNeXt-Base1x71.50ConfigWeights
Oriented R-CNNSLFFMConvNeXt-Large1x72.33ConfigWeights

Semantic Segmentation

Potsdam

MethodPre-trainBackboneLr SchdOAConfigModel
UperNetSLFFMConvNeXt-Base160k91.72ConfigWeights
UperNetSLFFMConvNeXt-Large160k91.82ConfigWeights

LoveDA

MethodPre-trainBackboneLr SchdmIoUConfigModel
UperNetSLFFMConvNeXt-Base160k52.59ConfigWeights
UperNetSLFFMConvNeXt-Large160k53.03ConfigWeights

Change Detection

LEVIR-CD

MethodPre-trainBackboneLr SchdF1ConfigModel
BITSLFFMConvNeXt-Base20k93.66ConfigWeights
BITSLFFMConvNeXt-Large20k93.89ConfigWeights

Usage

Environment

Pre-training

torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=1234 main.py data_path=${DataPath} --exp_name=${ExpName} --exp_dir=${ExpDir} --model=${Model} --bs=1024 --init_weight=${InitWeight}

Finetune on Object Detection

Train:

bash tools/dist_train.sh ${ConfigPath} 8

Test:

bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --format-only --eval-options submission_dir=${SubmissionDir}

Finetune on Semantic Segmentation

Train:

bash tools/dist_train.sh ${ConfigPath} 8  

Test:

bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --eval 'mFscore' 'mIoU'

Finetune on Change Detection

Train:

bash tools/dist_train.sh ${ConfigPath} 8

Test:

bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --eval mFscore mIoU