Home

Awesome

<div align="center"> <h1>MSVMamba </h1> <h3>Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model</h3>

Paper: (arXiv:2405.14174)

</div>

Updates

Introduction

MSVMamba is a visual state space model that introduces a hierarchy in hierarchy design to the VMamba model. This repository contains the code for training and evaluating MSVMamba models on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

<p align="center"> <img src="./assets/ms2d.jpg" width="800" /> </p>

Main Results

Classification on ImageNet-1K

namepretrainresolutionacc@1#paramsFLOPslogs&ckpts
MSVMamba-NanoImageNet-1K224x22477.37M0.9Glog&ckpt
MSVMamba-MicroImageNet-1K224x22479.812M1.5Glog&ckpt
MSVMamba-TinyImageNet-1K224x22482.833M4.6Glog&ckpt

Object Detection on COCO

Backbone#paramsFLOPsDetectorbox mAPmask mAPlogs&ckpts
MSVMamba-Micro32M201GMaskRCNN@1x43.839.9log&ckpt
MSVMamba-Tiny53M252GMaskRCNN@1x46.942.2log&ckpt
MSVMamba-Micro32M201GMaskRCNN@3x46.341.8log&ckpt
MSVMamba-Tiny53M252GMaskRCNN@3x48.343.2log&ckpt

Semantic Segmentation on ADE20K

BackboneInput#paramsFLOPsSegmentormIoU(SS)mIoU(MS)logs&ckpts
MSVMamba-Micro512x51242M875GUperNet@160k45.145.4log&ckpt
MSVMamba-Tiny512x51265M942GUperNet@160k47.8-log&ckpt

Getting Started

The steps to create env, train and evaluate MSVMamba models are followed by the same steps as VMamba.

Installation

Step 1: Clone the MSVMamba repository:

git clone https://github.com/YuHengsss/MSVMamba.git
cd MSVMamba

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n msvmamba
conda activate msvmamba

Install Dependencies

pip install -r requirements.txt
cd kernels/selective_scan && pip install .
<!-- cd kernels/cross_scan && pip install . -->

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0
<!-- conda create -n cu12 python=3.10 -y && conda activate cu12 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # install cuda121 for windows # install https://visualstudio.microsoft.com/visual-cpp-build-tools/ pip install timm==0.4.12 fvcore packaging -->

Quick Start

Classification

To train MSVMamba models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If MSVMamba is helpful for your research, please cite the following paper:

@article{shi2024multiscale,
      title={Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model}, 
      author={Yuheng Shi and Minjing Dong and Chang Xu},
      journal={arXiv preprint arXiv:2405.14174},
      year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mamba (paper, code), Swin-Transformer (paper, code), ConvNeXt (paper, code), OpenMMLab, thanks for their excellent works.