Home

Awesome

Multi Pretext Masked Autoencoder (MP-MAE)

Project Website Paper Code - Data

This repository contains code used to create the models and results presented in this paper MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning. It modifies the ConvNext V2 architecture to be used with MMEarth, which is a multi-modal geospatial remote sensing data.

📢 Latest Updates

:fire::fire::fire: Last Updated on 2024.08.07 :fire::fire::fire:

model-grey

Installation

See INSTALL.md for more instructions on the installation of dependencies

Training

See TRAINING.md for more details on training and finetuning.

Model Checkpoints

All the pretraining weights can be downloaded from here. The folders are named in the format shown below. Inside the folder you will find a checkpoint .pth weight file. An example to load the weights is in the examples folder.

CHECKPOINT FOLDER FORMAT
pt-($INPUT)_($MODEL)_($DATA)_($LOSS)_($MODEL_IMG_SIZE)_($PATCH_SIZE)/

$INPUT:
      - S2 # for s2-12 bands as input and output
      - all_mod # for s2-12 bands as input and all modalities as output
      - img_mod # for s2-12 bands as input and image level modalities as output
      - pix_mod # for s2-12 bands as input and pixel level modalities as output
      - rgb # for s2-bgr as input and output (we trained the model using bgr ordering)

$MODEL:
      - atto
      - tiny

$DATA:
      - 100k_128 # MMEarth100k, 100k locations and image size 128
      - 1M_64 # MMEarth64, 1.2M locations and image size 64
      - 1M_128 # MMEarth, 1.2M locations and image size 128

$LOSS: # loss weighting strategy
      - uncertainty
      - unweighted

$MODEL_IMG_SIZE # input size passed to the model
      - 56 # when using the data with image size 64
      - 112 # when using the data with image size 128

$PATCH_SIZE
      - 8
      - 16

Note: The only exception is when using the model trained on imagenet, the folder path is pt-imagenet_atto_200epochs_224_32/

A detailed overview of each checkpoint is shown in the table below.

INPUTOUTPUTMODELDATASETLOSSMODEL_IMG_SIZEPATCH_SIZECKPT
S2 12 bandall modalitiesAttoMMEarth64Uncertainty56x568x8download
S2 12 bandall modalitiesAttoMMEarth64Unweighted56x568x8download
S2 12 bandall modalitiesAttoMMEarthUncertainty112x11216x16download
S2 12 bandall modalitiesTinyMMEarth64Uncertainty56x568x8download
S2 12 bandall modalitiesAttoMMEarth100kUncertainty112x11216x16download
S2 12 bandimage level modalitiesAttoMMEarth64Uncertainty56x568x8download
S2 12 bandpixel level <br/> modalitiesAttoMMEarth64Uncertainty56x568x8download
S2 12 bandS2 12 bandAttoMMEarth64Uncertainty56x568x8download
S2 bgrS2 bgrAttoMMEarth64Uncertainty56x568x8download
S2 bgrS2 bgrAttoMMEarthUncertainty128x12816x16download

Acknowledgment

This repository borrows from the ConvNeXt V2 repository.

Citation

Please cite our paper if you use this code or any of the provided data.

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, & Nico Lang (2024). MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning.

@misc{nedungadi2024mmearth,
      title={MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning},
      author={Vishal Nedungadi and Ankit Kariryaa and Stefan Oehmcke and Serge Belongie and Christian Igel and Nico Lang},
      year={2024},
      eprint={2405.02771},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}