Home

Awesome

<div align="center"> <h1>TinyViM </h1> <h3>TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba</h3>

Xiaowen Ma, Zhenliang Ni, Xinghao Chen

Huawei Noah’s Ark Lab

[Paper Link]

</div>

🔥 News

📷 Introduction

<img src="fig/comp.png" /> <img src="fig/whole.png" />

We build a series of tiny hybrid vision Mamba called TinyViM by integrating mobile-friendly convolution and efficient Laplace mixer. The proposed TinyViM achieves impressive performance on several downstream tasks including image classification, semantic segmentation, object detection and instance segmentation. In particular, TinyViM outperforms Convolution, Transformer and Mamba-based models with similar scales, and the throughput is about 2-3 times higher than that of other Mamba-based models.

🏆 Performance

1️⃣ Classification

ModelTypeParams (M)GMACsThroughput (im/s)Top-1
TinyViM-SCNN-Mamba5.60.9256379.2
TinyViM-BCNN-Mamba11.01.5185181.2
TinyViM-LCNN-Mamba31.74.784383.3

2️⃣ Detection & Instance Segmentation

ModelHeadAP-boxAP-mask
TinyViM-BMask RCNN42.338.7
TinyViM-LMask RCNN44.540.7

3️⃣ Semantic Segmentation

ModelHeadThroughputmIoU
TinyViM-BFPN18041.9
TinyViM-LFPN11144.2

📚 Use example

🌟 Citation

If you are interested in our work, please consider giving a 🌟 and citing our work below.

@misc{tinyvim,
      title={TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba}, 
      author={Xiaowen Ma and Zhenliang Ni and Xinghao Chen},
      year={2024},
      eprint={2411.17473},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17473}, 
}

💡Acknowledgment

Thanks to previous open-sourced repo: Efficientformer, Swiftformer, RepViT, mmsegmentation, mmdetection