Awesome

<div align="center"> <h1>VSSD </h1> <h3>VSSD: Vision Mamba with Non-Causal State Space Duality</h3>

</div>

Updates

August. 05th, 2024: We release log and ckpt for VSSD with MESA.
July. 29th, 2024: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !
July. 25th, 2024: We release the code, log and ckpt for VSSD.

Introduction

Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

Main Results

Classification on ImageNet-1K

name	pretrain	resolution	acc@1	#params	FLOPs	logs	ckpts
VSSD-Micro	ImageNet-1K	224x224	82.5	14M	2.3G	log	ckpt
VSSD-Tiny	ImageNet-1K	224x224	83.6	24M	4.5G	log	ckpt
VSSD-Small	ImageNet-1K	224x224	84.1	40M	7.4G	log	ckpt
VSSD-Base	ImageNet-1K	224x224	84.7	89M	16.1G	log	ckpt

Enhanced model with MESA:

name	pretrain	resolution	acc@1	#params	FLOPs	logs	ckpts
VSSD-Tiny	ImageNet-1K	224x224	84.1	24M	4.5G	log	ckpt
VSSD-Small	ImageNet-1K	224x224	84.5	40M	7.4G	log	ckpt
VSSD-Base	ImageNet-1K	224x224	85.4	89M	16.1G	log	ckpt

Object Detection on COCO

Backbone	#params	FLOPs	Detector	box mAP	mask mAP	logs	ckpts
VSSD-Micro	33M	220G	MaskRCNN@1x	45.4	41.3	log	ckpt
VSSD-Tiny	44M	265G	MaskRCNN@1x	46.9	42.6	log	ckpt
VSSD-Small	59M	325G	MaskRCNN@1x	48.4	43.5	log	ckpt
VSSD-Micro	33M	220G	MaskRCNN@3x	47.7	42.8	log	ckpt
VSSD-Tiny	44M	265G	MaskRCNN@3x	48.8	43.6	log	ckpt
VSSD-Small	59M	325G	MaskRCNN@3x	50.0	44.6	-	ckpt

Semantic Segmentation on ADE20K

Backbone	Input	#params	FLOPs	Segmentor	mIoU(SS)	mIoU(MS)	logs	ckpts
VSSD-Micro	512x512	42M	893G	UperNet@160k	45.6	46.0	log	ckpt
VSSD-Tiny	512x512	53M	941G	UperNet@160k	47.9	48.7	log	ckpt

Getting Started

Installation

Step 1: Clone the VSSD repository:

git clone https://github.com/YuHengsss/VSSD.git
cd VSSD

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n VSSD
conda activate VSSD

Install Dependencies

pip install -r requirements.txt

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Quick Start

Classification

To train VSSD models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If VSSD is helpful for your research, please cite the following paper:

@article{shi2024vssd,
         title={VSSD: Vision Mamba with Non-Causal State Space Duality}, 
         author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
         journal={arXiv preprint arXiv:2407.18559},
         year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.