Home

Awesome

Mamba<sup>®</sup>: Vision Mamba ALSO Needs Registers

Official PyTorch implementation of Mamba<sup>®</sup>, a strong vision backbone with pure Mamba blocks

<img src="Assets\teaser.png" width="80%" />

Models

ModelIN-1k AccuracyCheckpoint
Mamba<sup>®</sup>-Small81.4mambar_small_patch16_224
Mamba<sup>®</sup>-Base83.0mambar_base_patch16_224
Mamba<sup>®</sup>-Large83.6mambar_large_patch16_224

Evaluation

python -m torch.distributed.launch --nproc_per_node=1  --use_env main.py --model mambar_base_patch16_224 --eval --eval-crop-ratio 1.0 --batch 128 --data-path /PATH/TO/IMAGENET --resume /PATH/TO/CHECKPOINT

Environments

# torch>=2.0, cuda>=11.8
pip install timm==0.4.12 mlflow==2.9.1
pip install causal-conv1d==1.1.0
pip install mamba-ssm==1.1.1

Training

Pretrain in 128*128 resolution (single 8-GPU node)

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
  --model mambar_base_patch16_224 \
  --batch 128 --lr 5e-4 --weight-decay 0.05 \
  --data-path /PATH/TO/IMAGENET \
  --output_dir ./output/mambar_base_patch16_224/pt \
  --reprob 0.0 --smoothing 0.0  --repeated-aug --ThreeAugment \
  --epochs 300 --input-size 128 --drop-path 0.1 --dist-eval

Intermediate training in 224*224 resolution

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
  --model mambar_base_patch16_224 \
  --batch 128 --lr 2e-4 --weight-decay 0.05 \
  --data-path /PATH/TO/IMAGENET \
  --finetune ./output/mambar_base_patch16_224/pt/checkpoint.pth \
  --output_dir ./output/mambar_base_patch16_224/mid \
  --reprob 0.0 --smoothing 0.0  --repeated-aug --ThreeAugment \
  --epochs 100 --input-size 224 --drop-path 0.4 --dist-eval

Finetune in 224*224

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
  --model mambar_base_patch16_224 \
  --batch 64 --lr 1e-5 --weight-decay 0.1 --unscale-lr \
  --data-path /PATH/TO/IMAGENET \
  --finetune ./output/mambar_base_patch16_224/mid/checkpoint.pth \
  --output_dir ./output/mambar_base_patch16_224/ft \
  --reprob 0.0 --smoothing 0.1 --no-repeated-aug \
  --aa rand-m9-mstd0.5-inc1 --eval-crop-ratio 1.0 \
  --epochs 20 --input-size 224 --drop-path 0.4

Citation

@article{mamba-r,
  title={Mamba-r: Vision mamba also needs registers},
  author={Wang, Feng and Wang, Jiahao and Ren, Sucheng and Wei, Guoyizhe and Mei, Jieru and Shao, Wei and Zhou, Yuyin and Yuille, Alan and Xie, Cihang},
  journal={arXiv preprint arXiv:2405.14858},
  year={2024}
}