Home

Awesome

<h1 align="center"> CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding </h1> <h5 align="center"><em>Dilxat Muhtar, Xueliang Zhang, Pengfeng Xiao, Zhenshi Li, and Feng Gu</em></h5> <p align="center"> <a href="#news">News</a> | <a href="#introduction">Introduction</a> | <a href="#models">Pre-trained Models</a> | <a href="#usage">Usage</a>| <a href="#acknowledgement">Acknowledgement</a> | <a href="#statement">Statement</a> </p > <p align="center"> <a href="https://arxiv.org/abs/2304.09670"><img src="https://img.shields.io/badge/Paper-arxiv-red"></a> <a href="https://ieeexplore.ieee.org/document/10105625"><img src="https://img.shields.io/badge/Paper-IEEE%20TGRS-red"></a> </p>

News

Introduction

This is the official repository for the paper “CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding”

Abstract: Self-supervised learning (SSL) has gained widespread attention in the remote sensing (RS) and earth observation (EO) communities owing to its ability to learn task-agnostic representations without human-annotated labels. Nevertheless, most existing RS SSL methods are limited to learning either global semantic separable or local spatial perceptible representations. We argue that this learning strategy is suboptimal in the realm of RS, since the required representations for different RS downstream tasks are often varied and complex. In this study, we proposed a unified SSL framework that is better suited for RS images representation learning. The proposed SSL framework, Contrastive Mask Image Distillation (CMID), is capable of learning representations with both global semantic separability and local spatial perceptibility by combining contrastive learning (CL) with masked image modeling (MIM) in a self-distillation way. Furthermore, our CMID learning framework is architecture-agnostic, which is compatible with both convolutional neural networks (CNN) and vision transformers (ViT), allowing CMID to be easily adapted to a variety of deep learning (DL) applications for RS understanding. Comprehensive experiments have been carried out on four downstream tasks (i.e. scene classification, semantic segmentation, object-detection, and change detection) and the results show that models pre-trained using CMID achieve better performance than other state-of-the-art SSL methods on multiple downstream tasks.

<figure> <div align="center"> <img src=Figure/CMID.png width="90%"> </div> </figure>

Models

MethodBackbonePre-trained DatasetPre-trained EpochsPre-trained modelBackbone Only
CMIDResNet-50MillionAID200NJU BoxNJU Box
CMIDSwin-BMillionAID200NJU BoxNJU Box
CMIDResNet-50Potsdam400NJU BoxNJU Box
CMIDSwin-BPotsdam400NJU BoxNJU Box
BYOLResNet-50Potsdam400NJU Box\
Barlow-TwinsResNet-50Potsdam400NJU Box\
MoCo-v2ResNet-50Potsdam400NJU Box\
MAEViT-BPotsdam400NJU Box\
SimMIMSwin-BPotsdam400NJU Box\

Scene Classification (UCM 8:2)

MethodBackbonePre-trained DatasetPre-trained EpochsOAWeights
CMIDResNet-50MillionAID20099.22NJU Box
CMIDSwin-BMillionAID20099.48NJU Box
BYOLResNet-50ImageNet20099.22NJU Box
Barlow-TwinsResNet-50ImageNet30099.16NJU Box
MoCo-v2ResNet-50ImageNet20097.92NJU Box
SwAVResNet-50ImageNet20098.96NJU Box
SeCoResNet-50SeCo-1m20097.66NJU Box
ResNet-50-SEN12MSResNet-50SEN12MS20096.88NJU Box
MAEViT-B-RVSAMillionAID160098.56NJU Box
MAEViTAE-B-RVSAMillionAID160097.12NJU Box

Semantic Segmentation

MethodBackbonePre-trained DatasetPre-trained EpochsmIoU (Potsdam)Weights (Potsdam)mIoU (VH)Weights (VH)
CMIDResNet-50MillionAID20087.35NJU Box79.44NJU Box
CMIDSwin-BMillionAID20088.36NJU Box80.01NJU Box
BYOLResNet-50ImageNet20085.54NJU Box72.52NJU Box
Barlow-TwinsResNet-50ImageNet30083.16NJU Box71.86NJU Box
MoCo-v2ResNet-50ImageNet20087.02NJU Box79.16NJU Box
SwAVResNet-50ImageNet20085.74NJU Box73.76NJU Box
SeCoResNet-50SeCo-1m20085.82NJU Box78.59NJU Box
ResNet-50-SEN12MSResNet-50SEN12MS20083.17NJU Box73.99NJU Box
MAEViT-B-RVSAMillionAID160086.37NJU Box77.29NJU Box
MAEViTAE-B-RVSAMillionAID160086.61NJU Box78.17NJU Box

Object Detection (DOTA V1.0 Dataset)

MethodBackbonePre-trained DatasetPre-trained EpochsmAPWeights
CMIDResNet-50MillionAID20076.63NJU Box
CMIDSwin-BMillionAID20077.36NJU Box
BYOLResNet-50ImageNet20073.62NJU Box
Barlow-TwinsResNet-50ImageNet30067.54NJU Box
MoCo-v2ResNet-50ImageNet20073.25NJU Box
SwAVResNet-50ImageNet20073.30NJU Box
MAEViT-B-RVSAMillionAID160078.08NJU Box
MAEViTAE-B-RVSAMillionAID160076.96NJU Box

Change Detection (CDD Dataset)

MethodBackbonePre-trained DatasetPre-trained EpochsmF1Weights
CMIDResNet-50MillionAID20096.95NJU Box
CMIDSwin-BMillionAID20097.11NJU Box
BYOLResNet-50ImageNet20096.30NJU Box
Barlow-TwinsResNet-50ImageNet30095.63NJU Box
MoCo-v2ResNet-50ImageNet20096.05NJU Box
SwAVResNet-50ImageNet20095.89NJU Box
SeCoResNet-50SeCo-1m20096.26NJU Box
ResNet-50-SEN12MSResNet-50SEN12MS20095.88NJU Box

Usage

Acknowledgement

Statement