Home

Awesome

<div align="center"> <h1>VSSD </h1> <h3>VSSD: Vision Mamba with Non-Causal State Space Duality</h3>

Paper: (arXiv:2407.18559)

</div>

Updates

Introduction

Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

<p align="center"> <img src="./assets/overall_arc.jpg" width="800" /> </p>

Main Results

Classification on ImageNet-1K

namepretrainresolutionacc@1#paramsFLOPslogsckpts
VSSD-MicroImageNet-1K224x22482.514M2.3Glogckpt
VSSD-TinyImageNet-1K224x22483.624M4.5Glogckpt
VSSD-SmallImageNet-1K224x22484.140M7.4Glogckpt
VSSD-BaseImageNet-1K224x22484.789M16.1Glogckpt

Enhanced model with MESA:

namepretrainresolutionacc@1#paramsFLOPslogsckpts
VSSD-TinyImageNet-1K224x22484.124M4.5Glogckpt
VSSD-SmallImageNet-1K224x22484.540M7.4Glogckpt
VSSD-BaseImageNet-1K224x22485.489M16.1Glogckpt

Object Detection on COCO

Backbone#paramsFLOPsDetectorbox mAPmask mAPlogsckpts
VSSD-Micro33M220GMaskRCNN@1x45.441.3logckpt
VSSD-Tiny44M265GMaskRCNN@1x46.942.6logckpt
VSSD-Small59M325GMaskRCNN@1x48.443.5logckpt
VSSD-Micro33M220GMaskRCNN@3x47.742.8logckpt
VSSD-Tiny44M265GMaskRCNN@3x48.843.6logckpt
VSSD-Small59M325GMaskRCNN@3x50.044.6-ckpt

Semantic Segmentation on ADE20K

BackboneInput#paramsFLOPsSegmentormIoU(SS)mIoU(MS)logsckpts
VSSD-Micro512x51242M893GUperNet@160k45.646.0logckpt
VSSD-Tiny512x51253M941GUperNet@160k47.948.7logckpt

Getting Started

Installation

Step 1: Clone the VSSD repository:

git clone https://github.com/YuHengsss/VSSD.git
cd VSSD

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n VSSD
conda activate VSSD

Install Dependencies

pip install -r requirements.txt

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0
<!-- conda create -n cu12 python=3.10 -y && conda activate cu12 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # install cuda121 for windows # install https://visualstudio.microsoft.com/visual-cpp-build-tools/ pip install timm==0.4.12 fvcore packaging -->

Quick Start

Classification

To train VSSD models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If VSSD is helpful for your research, please cite the following paper:

@article{shi2024vssd,
         title={VSSD: Vision Mamba with Non-Causal State Space Duality}, 
         author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
         journal={arXiv preprint arXiv:2407.18559},
         year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.