Awesome
<div align="center"> <h1>VSSD </h1> <h3>VSSD: Vision Mamba with Non-Causal State Space Duality</h3>Paper: (arXiv:2407.18559)
</div>Updates
August. 05th, 2024
: We release log and ckpt for VSSD with MESA.July. 29th, 2024
: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !July. 25th, 2024
: We release the code, log and ckpt for VSSD.
Introduction
Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.
<p align="center"> <img src="./assets/overall_arc.jpg" width="800" /> </p>Main Results
Classification on ImageNet-1K
name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
---|---|---|---|---|---|---|---|
VSSD-Micro | ImageNet-1K | 224x224 | 82.5 | 14M | 2.3G | log | ckpt |
VSSD-Tiny | ImageNet-1K | 224x224 | 83.6 | 24M | 4.5G | log | ckpt |
VSSD-Small | ImageNet-1K | 224x224 | 84.1 | 40M | 7.4G | log | ckpt |
VSSD-Base | ImageNet-1K | 224x224 | 84.7 | 89M | 16.1G | log | ckpt |
Enhanced model with MESA:
name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
---|---|---|---|---|---|---|---|
VSSD-Tiny | ImageNet-1K | 224x224 | 84.1 | 24M | 4.5G | log | ckpt |
VSSD-Small | ImageNet-1K | 224x224 | 84.5 | 40M | 7.4G | log | ckpt |
VSSD-Base | ImageNet-1K | 224x224 | 85.4 | 89M | 16.1G | log | ckpt |
Object Detection on COCO
Backbone | #params | FLOPs | Detector | box mAP | mask mAP | logs | ckpts |
---|---|---|---|---|---|---|---|
VSSD-Micro | 33M | 220G | MaskRCNN@1x | 45.4 | 41.3 | log | ckpt |
VSSD-Tiny | 44M | 265G | MaskRCNN@1x | 46.9 | 42.6 | log | ckpt |
VSSD-Small | 59M | 325G | MaskRCNN@1x | 48.4 | 43.5 | log | ckpt |
VSSD-Micro | 33M | 220G | MaskRCNN@3x | 47.7 | 42.8 | log | ckpt |
VSSD-Tiny | 44M | 265G | MaskRCNN@3x | 48.8 | 43.6 | log | ckpt |
VSSD-Small | 59M | 325G | MaskRCNN@3x | 50.0 | 44.6 | - | ckpt |
Semantic Segmentation on ADE20K
Backbone | Input | #params | FLOPs | Segmentor | mIoU(SS) | mIoU(MS) | logs | ckpts |
---|---|---|---|---|---|---|---|---|
VSSD-Micro | 512x512 | 42M | 893G | UperNet@160k | 45.6 | 46.0 | log | ckpt |
VSSD-Tiny | 512x512 | 53M | 941G | UperNet@160k | 47.9 | 48.7 | log | ckpt |
Getting Started
Installation
Step 1: Clone the VSSD repository:
git clone https://github.com/YuHengsss/VSSD.git
cd VSSD
Step 2: Environment Setup:
Create and activate a new conda environment
conda create -n VSSD
conda activate VSSD
Install Dependencies
pip install -r requirements.txt
Dependencies for Detection
and Segmentation
(optional)
pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0
<!-- conda create -n cu12 python=3.10 -y && conda activate cu12
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# install cuda121 for windows
# install https://visualstudio.microsoft.com/visual-cpp-build-tools/
pip install timm==0.4.12 fvcore packaging -->
Quick Start
Classification
To train VSSD models for classification on ImageNet, use the following commands for different configurations:
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp
If you only want to test the performance (together with params and flops):
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval
Detection and Segmentation
To evaluate with mmdetection
or mmsegmentation
:
bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1
use --tta
to get the mIoU(ms)
in segmentation
To train with mmdetection
or mmsegmentation
:
bash ./tools/dist_train.sh </path/to/config> 8
Citation
If VSSD is helpful for your research, please cite the following paper:
@article{shi2024vssd,
title={VSSD: Vision Mamba with Non-Causal State Space Duality},
author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
journal={arXiv preprint arXiv:2407.18559},
year={2024}
}
Acknowledgment
This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.