Home

Awesome

<div align="center"> <h1>HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model</h1>

Di Wang<sup>1 ∗</sup>, Meiqi Hu<sup>1 ∗</sup>, Yao Jin<sup>1 ∗</sup>, Yuchun Miao<sup>1 ∗</sup>, Jiaqi Yang<sup>1 ∗</sup>, Yichu Xu<sup>1 ∗</sup>, Xiaolei Qin<sup>1 ∗</sup>, Jiaqi Ma<sup>1 ∗</sup>, Lingyu Sun<sup>1 ∗</sup>, Chenxing Li<sup>1 ∗</sup>, Chuan Fu<sup>2</sup>, Hongruixuan Chen<sup>3</sup>, Chengxi Han<sup>1 †</sup>, Naoto Yokoya<sup>3</sup>, Jing Zhang<sup>1 †</sup>, Minqiang Xu<sup>4</sup>, Lin Liu<sup>4</sup>, Lefei Zhang<sup>1</sup>, Chen Wu<sup>1 †</sup>, Bo Du<sup>1 †</sup>, Dacheng Tao<sup>5</sup>, Liangpei Zhang<sup>1 †</sup>

<sup>1</sup> Wuhan University, <sup>2</sup> Chongqing University, <sup>3</sup> The University of Tokyo, <sup>4</sup> National Engineering Research Center of Speech and Language Information Processing, <sup>5</sup> Nanyang Technological University.

<sup></sup> Equal contribution, <sup></sup> Corresponding author

</div> <div align="center"> <!-- [![arXiv paper](https://img.shields.io/badge/arXiv-2406.11519-b31b1b.svg)](https://arxiv.org/abs/2406.11519) -->

Hits Hits Hits Hits

</div> <p align="center"> <a href="#-update">Update</a> | <a href="#-overview">Overview</a> | <a href="#-datasets">Datasets</a> | <a href="#-pretrained-models">Pretrained Models</a> | <a href="#-usage">Usage</a> | <a href="#-statement">Statement</a> </p > <figure> <div align="center"> <img src=Fig/logo.png width="20%"> </div> </figure>

🔥 Update

2024.10.22

2024.07.18

2024.06.18

🌞 Overview

HyperSIGMA is the first billion-level foundation model specifically designed for HSI interpretation. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module.</a>

<figure> <div align="center"> <img src=Fig/framework.png width="80%"> </div> <div align='center'>

Figure 1. Framework of HyperSIGMA.

</div> <br>

Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA’s versatility and superior representational capability compared to current state-of-the-art methods. It outperforms advanced models like SpectralGPT, even those specifically designed for these tasks.

<figure> <div align="center"> <img src=Fig/radarimg.png width="80%"> </div> </figure>

Figure 2. HyperSIGMA demonstrates superior performance across 16 datasets and 7 tasks, including both high-level and low-level hyperspectral tasks, as well as multispectral scenes.

📖 Datasets

To train the foundational model, we collected hyperspectral remote sensing image samples from around the globe, constructing a large-scale hyperspectral dataset named HyperGlobal-450K for pre-training. HyperGlobal-450K contains over 20 million three-band images, far exceeding the scale of existing hyperspectral datasets.

<figure> <div align="center"> <img src=Fig/dataset.png width="80%"> </div> </figure>

Figure 3. The distribution of HyperGlobal-450K samples across the globe, comprising 1,701 images (1,486 EO-1 and 215 GF-5B) with hundreds of spectral bands.

🚀 Pretrained Models

PretrainBackboneModel Weights
Spatial_MAEViT-BBaidu Drive & Hugging Face
Spatial_MAEViT-LBaidu Drive & Hugging Face
Spatial_MAEViT-HBaidu Drive & Hugging Face
Spectral_MAEViT-BBaidu Drive & Hugging Face
Spectral_MAEViT-LBaidu Drive & Hugging Face
Spectral_MAEViT-HBaidu Drive & Hugging Face

🔨 Usage

Pretraining

We pretrain the HyperSIGMA with SLURM. This is an example of pretraining the large version of Spatial ViT:

srun -J spatmae -p xahdnormal --gres=dcu:4 --ntasks=64 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain_Spat.py \
--model 'spat_mae_l' --norm_pix_loss \
--data_path [pretrain data path] \
--output_dir [model saved patch] \
--log_dir [log saved path] \
--blr 1.5e-4 --batch_size 32 --gpu_num 64 --port 60001

Another example of pretraining the huge version of Spectral ViT:

srun -J specmae -p xahdnormal --gres=dcu:4 --ntasks=128 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain_Spec.py \
--model 'spec_mae_h' --norm_pix_loss \
--data_path [pretrain data path] \
--output_dir [model saved patch] \
--log_dir [log saved path] \
--blr 1.5e-4 --batch_size 16 --gpu_num 128 --port 60004  --epochs 1600 --mask_ratio 0.75 \
--use_ckpt 'True'

The training can be recovered by setting --resume

--resume [path of saved model]

Finetuning

Image Classification:

Please refer to ImageClassification-README.

Target Detection & Anomaly Detection:

Please refer to HyperspectralDetection-README.

Change Detection:

Please refer to ChangeDetection-README.

Spectral Unmixing:

Please refer to HyperspectralUnmixing-README.

Denoising:

Please refer to Denoising-README.

Super-Resolution:

Please refer to SR-README.

Multispectral Change Detection:

Please refer to MultispectralCD-README.

⭐ Citation

If you find HyperSIGMA helpful, please consider giving this repo a ⭐ and citing:

@article{hypersigma,
  title={HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model},
  author={Wang, Di and Hu, Meiqi and Jin, Yao and Miao, Yuchun and Yang, Jiaqi and Xu, Yichu and Qin, Xiaolei and Ma, Jiaqi and Sun, Lingyu and Li, Chenxing and Fu, Chuan and Chen, Hongruixuan and Han, Chengxi and Yokoya, Naoto and Zhang, Jing and Xu, Minqiang and Liu, Lin and Zhang, Lefei and Wu, Chen and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
  journal={arXiv preprint arXiv:2406.11519},
  year={2024}
}

🎺 Statement

For any other questions please contact di.wang at gmail.com or whu.edu.cn, and chengxi.han at whu.edu.cn.

💖 Thanks

This project is based on MMCV, MAE, Swin Transformer, VSA, RVSA, DAT, HTD-IRN, GT-HAD, MSDformer, SST-Former, SST, CNNAEU and DeepTrans. Thanks for their wonderful work!<br>