Awesome
Awesome-Vision-Mamba-Models
[NEWS.2024/11/10] The latest version of our paper (v3) is now available! This update includes numerous high-quality papers on visual Mamba.
[NEWS.2024/09/26] 🎉🎉🎉Congratulations to VMamba on being accepted in NeurIPS 2024.
[NEWS.2024/07/06] The updated version of our paper is now available!
[NEWS.2024/05/02] 🎉🎉🎉Congratulations to Vision Mamba on being accepted in ICML 2024.
[NEWS.2024/04/29] Our paper is released!
📢NOTE: If you have any questions, please don't hesitate to contact us at any of the following emails: cseruixu@ust.hk, syangcw@connect.ust.hk, ywangrm@connect.ust.hk, yu.cai@connect.ust.hk.
Mamba, a novel state space model, has gained recognition across diverse domains for its exceptional performance and efficient computational complexity. By addressing the limitations inherent in traditional visual foundation architectures, Mamba emerges as a promising contender poised to catalyze advancements in the field of computer vision.
:star: This repository hosts a curated collection of literature associated with Mamba models in computer vision. Feel free to star and fork. For further details, refer to the following paper:
Visual Mamba: A Survey and New Outlooks<br/> Rui Xu, Shu Yang, Yihui Wang, Yu Cai, Bo Du, Hao Chen<br/> SMART Lab, The Hong Kong University of Science and Technology<br/> <br/>
If you find this repository is useful for you, please cite our paper:
@misc{2024visual_mamba,
title={Visual Mamba: A Survey and New Outlooks},
author={Rui Xu and Shu Yang and Yihui Wang and Yu Cai and Bo Du and Hao Chen},
year={2024},
eprint={2404.18861},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Other works of HKUST SMART Lab:
@inproceedings{MambaMIL,
author = {Shu Yang and Yihui Wang and Hao Chen},
title = {MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology},
booktitle = {International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
volume = {15004},
pages = {296--306},
publisher = {Springer},
year = {2024}
}
Contents
- Mamba
- Related Survey
- Visual Mamba Backbone Networks
- Vision Application (Modality)
- Valuable Insights
- Other Domains
Mamba
Date | Paper | Figure | Link | Code |
---|---|---|---|---|
Arxiv 23.12.01 (COLM 2024) | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | Link | Code | |
Arxiv 24.05.31 (ICML 2024) | Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality | Link | Code |
Related Survey
Date | Paper | Link |
---|---|---|
Arxiv 24.04.15 | State Space Model for New-Generation Network Alternative to Transformers: A Survey | Link |
Arxiv 24.04.24 | A Survey on Visual Mamba | Link |
Arxiv 24.04.24 | Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges | Link |
Arxiv 24.05.07 | Vision Mamba: A Comprehensive Survey and Taxonomy | Link |
Arxiv 24.06.05 | Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis | Link |
Arxiv 24.06.24 | Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba | Link |
Arxiv 24.08.02 | A Survey of Mamba | Link |
Arxiv 24.10.03 | A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond | Link |
Arxiv 24.10.04 | Mamba in Vision: A Comprehensive Survey of Techniques and Applications | Link |
Visual Mamba Backbone Networks
<img width="600" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/57466105/4843bead-14cd-4aa6-aecf-af9411defc49">Detailed Performance Comparison
Date | Paper | Figure | Link | Code |
---|---|---|---|---|
Arxiv 24.01.17 (ICML 2024) | Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | <img width="684" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/6d32c807-3d2f-457e-8927-fa4bbe595064"> | Link | Code |
Arxiv 24.01.18 (NeurIPS 2024 Spotlight) | VMamba: Visual State Space Model | <img width="806" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/039e24f6-5f89-4772-bb84-7409aeef4da0"> <img width="833" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/75158bbf-18e9-45fc-93e0-7d84c062ed0d"> | Link | Code |
Arxiv 24.02.08 (ECCV 2024 Oral) | Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data | <img width="712" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/0ee52771-63ec-4e1d-bd24-aff9fe83c8e6"> | Link | Code |
Arxiv 24.03.14 | LocalMamba: Visual State Space Model with Windowed Selective Scan | <img width="710" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/1c2bcfb8-72d0-4f33-b561-f926952455ff"> | Link | Code |
Arxiv 24.03.15 | EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba | <img width="719" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/7e034c04-3359-456e-a2b3-720b4b37e975"> | Link | Code |
Arxiv 24.03.22 | SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series | <img width="622" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/bef43ea0-0d1e-4c2f-93e1-231b41394195"> | Link | Code |
Arxiv 24.03.26 (BMVC 2024) | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | <img width="713" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/d1170f9a-b9b2-4c4d-ab44-cd6c52a07c8d"> | Link | Code |
Arxiv 24.03.29 | MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection | Link | ||
Arxiv 24.05.23 (NeurIPS 2024) | Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model | Link | Code | |
Arxiv 24.05.23 | Scalable Visual State Space Model with Fractal Scanning | Link | ||
Arxiv 24.05.23 | Mamba-R: Vision Mamba ALSO Needs Registers | Link | Code | |
Arxiv 24.05.29 | Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain | Link | Code | |
Arxiv 24.06.11 | Autoregressive Pretraining with Mamba in Vision | Link | Code | |
Arxiv 24.07.10 | MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Link | Code | |
Arxiv 24.07.18 | GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model | Link | Code | |
Arxiv 24.07.26 | VSSD: Vision Mamba with Non-Causal State Space Duality | Link | Code | |
Arxiv 24.08.30 | Stochastic Layer-Wise Shuffle: A Good Practice to Improve Vision Mamba Training | Link | Code | |
Arxiv 24.09.15 | SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks | Link | Code | |
Arxiv 24.09.18 | Distillation-free Scaling of Large SSMs for Images and Videos | Link | ||
NeurIPS 24.09.26 | Vision Mamba Mender | Link | Code | |
Arxiv 24.09.27 (NeurIPS 2024) | Exploring Token Pruning in Vision State Space Models | Link | ||
Arxiv 24.10.01 | MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining | Link | ||
Arxiv 24.10.04 | HRVMamba: High-Resolution Visual State Space Model for Dense Prediction | Link | Code | |
Arxiv 24.10.09 (NeurIPS 2024) | QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model | Link | Code | |
Arxiv 24.10.14 | GlobalMamba: Global Image Serialization for Vision Mamba | Link | Code | |
Arxiv 24.10.14 | V2M: Visual 2-Dimensional Mamba for Image Representation Learning | Link | Code | |
Arxiv 24.10.19 | Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion | Link | Code | |
Arxiv 24.10.22 | EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality | Link | Code | |
Arxiv 24.11.24 | MobileMamba: Lightweight Multi-Receptive Visual Mamba Network | Link | Code | |
Arxiv 24.11.26 | TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba | Link | Code | |
Arxiv 24.12.17 | GG-SSMs: Graph-Generating State Space Models | Link | ||
Arxiv 24.12.17 | Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training | Link | Code |
Vision Application
Image
Natural Image
Date | Paper | Figure | Link | Code | Task |
---|---|---|---|---|---|
Arxiv 24.02.06 | U-shaped Vision Mamba for Single Image Dehazing | <img width="848" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/3ca0831b-711c-4073-841e-2eba4f2e718d"> | Link | Code | Dehazing/Low Light Enhancement/Deraining |
Arxiv 24.02.08 | Scalable Diffusion Models with State Space Backbone | <img width="588" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/9d900e4b-4c3c-427a-a857-681a3f3470dd"> | Link | Code | Image Generation |
Arxiv 24.02.23 (ECCV 2024) | MambaIR: A Simple Baseline for Image Restoration with State-Space Model | <img width="708" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/22041ebc-cae7-4e72-a537-a7af3429b6d8"> | Link | Code | Super-resolution/Denoising |
Arxiv 24.03.04 (TGRS 2024) | MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection | <img width="847" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/c38ffac7-65b7-452c-b0ec-c3a17f8de860"> | Link | Code | Infrared Image Segmentation |
Arxiv 24.03.13 | Activating Wider Areas in Image Super-Resolution | <img width="700" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/dfaf5b9a-19e0-4058-a4aa-a9af26df6334"> | Link | Super-resolution | |
Arxiv 24.03.18 | VmambaIR: Visual State Space Model for Image Restoration | <img width="485" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/1126cd48-1c85-4c09-883f-c4a50b922fd0"> | Link | Code | Image Restoration |
Arxiv 24.03.20 (ECCV 2024) | ZigMa: A DiT-style Zigzag Mamba Diffusion Model | <img width="702" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/3a14b8da-188b-4c00-a054-c4cb47562f9e"> | Link | Code | Generation |
Arxiv 24.03.27 | Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction | <img width="564" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/361b14b9-6291-47d1-b8e0-ae667db5aa22"> | Link | 3D Reconstruction | |
Arxiv 24.03.29 | Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring | <img width="730" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/a84a1311-1ed4-4c51-828c-94b9e5b95578"> | Link | Image Deblurring | |
Arxiv 24.04.04 | InsectMamba: Insect Pest Classification with State Space Model | <img width="554" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/2bb11b9a-c952-4f12-afd9-1ba2bce3ce9c"> | Link | Image Classification | |
Arxiv 24.04.09 (NeurIPS 2024) | MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection | <img width="793" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/0d601311-b0bf-48e1-b0d6-17ee9c3101d0"> | Link | code | Anomaly Detection |
Arxiv 24.04.11 (ACM MM 2024) | DGMamba: Domain Generalization via Generalized State Space Model | <img width="720" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/50d5a8bb-d701-40a1-9f17-52a7d9c96221"> | Link | Code | Domain Generalization |
Arxiv 24.04.15 (ACM MM 2024) | FreqMamba: Viewing Mamba from a Frequency Perspective for Image Deraining | <img width="798" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/430aabed-9c3f-40f5-b062-d748829d20fa"> | Link | Code | Deraining |
Arxiv 24.04.17 | CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration | <img width="1102" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/d4c0ac33-8541-4cf0-84aa-bd5b0958516f"> | Link | Denoising/Deblurring | |
Arxiv 24.04.22 | MambaUIE: Unraveling the Ocean's Secrets with Only 2.8 FLOPs | <img width="687" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/7e84a297-ea0f-4e27-b9cc-cdb36b2b0e6f"> | Link | Code | Image Enhancement |
Arxiv 24.05.03 | FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space | Link | Code | Emotion recognition & Facial Expression Recognition & Detection | |
Arxiv 24.05.05 (CVPR 2024 Workshop) | DVMSR: Distillated Vision Mamba for Efficient Super-Resolution | Link | Code | Super-Resolution | |
Arxiv 24.05.05 | SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion | Link | Motion Style Transfer | ||
Arxiv 24.05.06 | Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement | Link | Code | Image Enhancement | |
Arxiv 24.05.07 | VMambaCC: A Visual State Space Model for Crowd Counting | Link | Crowd Counting | ||
Arxiv 24.05.14 | WaterMamba: Visual State Space Model for Underwater Image Enhancement | <img width="551" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/2307c102-8490-44d7-bf66-add22160739d"> | Link | Image Enhancement | |
Arxiv 24.05.16 | IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model | Link | Code | Infrared Image Super-resolution | |
Arxiv 24.05.23 | Efficient Visual State Space Model for Image Deblurring | <img width="540" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/a1f85d94-136f-432b-ba51-de322b500539"> | Link | Code | Image Deblurring |
Arxiv 24.05.23 | DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis | Link | Code | Generation | |
Arxiv 24.05.25 | Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation | <img width="548" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/e904635f-26e9-4690-b215-f9ea741bb354"> | Link | Generation | |
Arxiv 24.05.25 (NeurIPS 2024) | MambaLLIE: Implicit Retinex-Aware Low Light Enhancement with Global-then-Local State Space | Link | Code | Image Enhancement | |
Arxiv 24.05.26 | Image Deraining with Frequency-Enhanced State Space Model | <img width="439" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/8fd63341-7281-4911-a506-a814f5fb56df"> | Link | Image Deraining | |
Arxiv 24.05.28 | MambaVC: Learned Visual Compression with Selective State Spaces | <img width="533" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/9591e9ef-7984-4851-bf09-dee72676c5e4"> | Link | Code | Visual Compression |
Arxiv 24.05.29 | FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining | Link | Image Deraining | ||
Arxiv 24.06.03 | LLEMamba: Low-Light Enhancement via Relighting-Guided Mamba with Deep Unfolding Network | Link | Low-Light Enhancement | ||
Arxiv 24.06.06 | MambaDepth: Enhancing Long-range Dependency for Self-Supervised Fine-Structured Monocular Depth Estimation | Link | Depth Estimation | ||
Arxiv 24.06.09 | Mamba YOLO: SSMs-Based YOLO For Object Detection | Link | Code | Object Detection | |
Arxiv 24.06.12 | PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement | Link | Code | Image Enhancement | |
Arxiv 24.06.18 | LFMamba: Light Field Image Super-Resolution with State Space Model | Link | Code | Super-resolution | |
Arxiv 24.06.13 | Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment | Link | Image Quality Assessment | ||
Arxiv 24.06.23 | Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning | Link | Super-resolution | ||
Arxiv 24.06.24 | Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces | Link | Crack Segmentation | ||
Arxiv 24.06.25 (WACV 2025) | SUM: Saliency Unification through Mamba for Visual Attention Modeling | Link | Code | Visual Saliency Prediction | |
Arxiv 24.07.02 (ECCV 2024) | MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders | Link | Code | Multi-Task Dense Scene Understanding | |
Arxiv 24.07.08 | Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning | Link | Code | Few-Shot Class-Incremental Learning | |
Arxiv 24.07.11 (ICML 2024 Workshop) | Parallelizing Autoregressive Generation with Variational State Space Models | Link | Generation | ||
Arxiv 24.07.12 (NeurIPS 2024) | Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba | Link | Code | 3D Hand Reconstruction | |
Arxiv 24.07.16 | PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer | Link | Image Classification/Object Detection/Point Cloud Object Detection | ||
Arxiv 24.07.22 | Mamba meets crack segmentation | Link | Code | Segmentation | |
Arxiv 24.07.23 | MxT: Mamba x Transformer for Image Inpainting | Link | Image Inpainting | ||
Arxiv 24.07.25 | ALMRR: Anomaly Localization Mamba on Industrial Textured Surface with Feature Reconstruction and Refinement | Link | Code | Anomaly Localization | |
Arxiv 24.07.27 | Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint | Link | Code | Image Enhancement | |
Arxiv 24.07.27 (WBIR 2024 Workshop) | Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration | Link | Code | Image Registration | |
Arxiv 24.08.01 | MonoMM: A Multi-scale Mamba-Enhanced Network for Real-time Monocular 3D Object Detection | Link | Monocular 3D Object Detection | ||
Arxiv 24.08.02 (ACM MM 2024) | Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement | Link | Code | Image Enhancement | |
Arxiv 24.08.04 | DeMansia: Mamba Never Forgets Any Tokens | Link | Code | Classification | |
Arxiv 24.08.05 | LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba | Link | Generation | ||
Arxiv 24.08.06 | Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network | Link | Human Pose Estimation | ||
Arxiv 24.08.07 | PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model | Link | Human Pose Estimation | ||
Arxiv 24.08.11 | Neural Architecture Search based Global-local Vision Mamba for Palm-Vein Recognition | Link | Palm-Vein Recognition | ||
Arxiv 24.08.16 | QMambaBSR: Burst Image Super-Resolution with Query State Space Model | Link | Super-Resolution | ||
Arxiv 24.08.19 | Multi-Scale Representation Learning for Image Restoration with State-Space Model | Link | Image Restoration | ||
Arxiv 24.08.21 | MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering | Link | Code | Occupancy Prediction | |
Arxiv 24.08.21 | MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs | Link | Code | Super-resolution | |
Arxiv 24.08.22 | Scalable Autoregressive Image Generation with Mamba | Link | Code | Generation | |
Arxiv 24.08.23 | O-Mamba: O-shape State-Space Model for Underwater Image Enhancement | Link | Code | Image Enhancement | |
Arxiv 24.08.27 | ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning | Link | Code | Zero-Shot Learning | |
Arxiv 24.08.27 | MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders | Link | Code | Multi-Task Dense Scene Understanding | |
Arxiv 24.08.31 | A Hybrid Transformer-Mamba Network for Single Image Deraining | Link | Code | Deraining | |
Arxiv 24.09.02 (ICPR) | DS MYOLO: A Reliable Object Detector Based on SSMs for Driving Scenarios | Link | Object Detection | ||
Arxiv 24.09.09 | DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification | Link | Driver Distraction Identification | ||
Arxiv 24.09.11 | Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement | Link | Code | Image Enhancement | |
Arxiv 24.09.15 (ECCV 2024 Workshop) | Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion | Link | Code | Efficiency | |
Arxiv 24.09.16 | Mamba-ST: State Space Model for Efficient Style Transfer | Link | Code | Style Transfer | |
Arxiv 24.09.20 (ACCV 2024) | OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping | Link | Code | Bird's-Eye-View Semantic Mapping | |
Arxiv 24.09.25 | Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement | Link | Code | Image Enhancement | |
Neurocomputing 24.09.28 | MambaTSR: You only need 90k parameters for traffic sign recognition | Link | Code | Traffic Sign Recognition | |
Scientific Reports 24.09.28 | Toward identity preserving in face sketch-photo synthesis using a hybrid CNN-Mamba framework | Link | Sketch-photo Synthesis | ||
Arxiv 24.09.29 (NeurIPS 2024) | Hybrid Mamba for Few-Shot Segmentation | Link | Code | Few-Shot Segmentation | |
Arxiv 24.10.05 | Mamba Capsule Routing Towards Part-Whole Relational Camouflaged Object Detection | Link | Code | Camouflaged Object Detection | |
Arxiv 24.10.14 | Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution | Link | Super-resolution | ||
Arxiv 24.10.16 | MambaBEV: An efficient 3D detection model with Mamba2 | Link | 3D Object Detection | ||
Arxiv 24.10.21 (NeurIPS 2024) | START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation | Link | Code | Domain Generalization | |
Arxiv 24.10.25 (JAC 2024) | Topology-aware Mamba for Crack Segmentation in Structures | Link | Code | Crack Segmentation | |
Arxiv 24.10.27 (ACCV 2024) | Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement | Link | Code | Image Enhancement | |
Arxiv 24.10.28 (NeurIPS 2024) | ECMamba: Consolidating Selective State Space Model with Retinex Guidance for Efficient Multiple Exposure Correction | Link | Code | Multiple Exposure Correction | |
ACM MM 24.10.28 | Realistic Full-Body Motion Generation from Sparse Tracking with State Space Model | Link | Motion Generation | ||
Arxiv 24.10.30 | Adaptive Multi Scale Document Binarisation Using Vision Mamba | Link | Document Binarisation | ||
Arxiv 24.11.05 | ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal | Link | Shadow Removal | ||
Arxiv 24.11.06 | MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba | Link | Parameter-Efficient Fine-Tuning | ||
Arxiv 24.11.06 (NeurIPS 2024) | DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation | Link | Code | Generation | |
Arxiv 24.11.10 (WACV 2025) | SEM-Net: Efficient Pixel Modelling for Image Inpainting with Spatially Enhanced SSM | Link | Code | Inpainting | |
Arxiv 24.11.11 (SPL 2024) | LFSamba: Marry SAM with Mamba for Light Field Salient Object Detection | Link | Code | Detection | |
Arxiv 24.11.12 (NeurIPS2024 Workshop) | Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules | Link | Rule Learning/Reasoning | ||
Arxiv 24.11.15 | M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation | Link | Code | Generation | |
Arxiv 24.11.16 | S3Mamba: Arbitrary-Scale Super-Resolution via Scaleable State Space Model | Link | Code | Super-resolution | |
Arxiv 24.11.21 | Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation | Link | Parameter Efficient Fine Tuning | ||
Arxiv 24.11.22 | OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction | Link | Exposure Correction | ||
Arxiv 24.11.22 | MambaIRv2: Attentive State Space Restoration | Link | Code | Restoration | |
Arxiv 24.11.23 | Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning | Link | Continual Learning | ||
Arxiv 24.11.24 | MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking | Link | Night UAV Tracking | ||
Arxiv 24.11.25 | Deformable Mamba for Wide Field of View Segmentation | Link | Code | View Segmentation | |
Arxiv 24.11.27 | Vision Mamba Distillation for Low-resolution Fine-grained Image Classification | Link | Code | Classification | |
Arxiv 24.12.01 | MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning | Link | Night UAV Tracking | ||
Arxiv 24.12.01 | Learning Mamba as a Continual Learner | Link | Continual Learning | ||
Arxiv 24.12.02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Link | Code | Keypoint Detection | |
Arxiv 24.12.10 | MPSI: Mamba enhancement model for pixel-wise sequential interaction Image Super-Resolution | Link | Super-resolution | ||
Arxiv 24.12.12 (AAAI 2025) | Selective Visual Prompting in Vision Mamba | Link | Code | Domain Adaptation | |
Arxiv 24.12.13 | XYScanNet: An Interpretable State Space Model for Perceptual Image Deblurring | Link | Deblurring |
Remote Sensing Image
Date | Paper | Figure | Link | Code | Task |
---|---|---|---|---|---|
Arxiv 24.03.28 (GRSL 2024) | RSMamba: Remote Sensing Image Classification with State Space Model | Link | Code | Remote Sensing Images Classification | |
Arxiv 24.04.02 | Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model | <img width="402" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/c64ee6cc-ced1-4d27-b1af-27582f089fb0"> | Link | Code | Semantic Segmentation |
Arxiv 24.04.03 (GRSL 2024) | RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation | <img width="502" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/1767d964-15ea-4085-a1f0-937cba3cf915"> | Link | Code | Semantic Segmentation |
Arxiv 24.04.03 | RS-Mamba for Large Remote Sensing Image Dense Prediction | <img width="942" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/4c47c3b7-5df8-4d77-93ba-d35263916f03"> | Link | Code | Semantic Segmentation/Change Detection |
Arxiv 24.04.04 (TGRS 2024) | ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model | <img width="1023" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/476bf8d5-625e-4e65-ad19-bc14817c9a58"> | Link | Code | Change Detection/Building Damage Assessment |
Arxiv 24.04.12 | SpectralMamba: Efficient Mamba for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
Arxiv 24.04.15 | HSIDMamba: Exploring Bidirectional State-Space Models for Hyperspectral Denoising | <img width="947" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/91ebd913-ef36-400e-a83c-8d24fc5536b3"> | Link | Hyperspectral Denoising | |
Arxiv 24.04.28 | S2Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
Arxiv 24.04.29 | Spectral-Spatial Mamba for Hyperspectral Image Classification | Link | Hyperspectral Image Classification | ||
Arxiv 24.05.02 | SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients | Link | Code | Detection | |
Arxiv 24.05.02 (TGRS 2024) | SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising | Link | Code | Hyperspectral Image Denoising | |
Arxiv 24.05.08 (TMM 2024) | Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution | <img width="745" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/51b4b08f-8086-4e57-8006-3e9ba06ff205"> | Link | Code | Super-resolution |
Arxiv 24.05.13 | GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images | Link | Code | Spectral Reconstruction from RGB Images | |
Arxiv 24.05.14 | Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study | Link | Semantic Segmentation | ||
Arxiv 24.05.16 | RSDehamba: Lightweight Vision Mamba for Remote Sensing Satellite Image Dehazing | Link | Dehazing | ||
Arxiv 24.05.17 | CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation | Link | Code | Semantic Segmentation | |
Arxiv 24.05.20 | Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
Arxiv 24.05.21 | 3DSS-Mamba: 3D-Spectral-Spatial Mamba for Hyperspectral Image Classification | Link | Hyperspectral Image Classification | ||
Arxiv 24.06.01 | Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging | Link | Code | Spectral Compressive Imaging | |
Arxiv 24.06.06 | CDMamba: Remote Sensing Image Change Detection with Mamba | Link | Code | Change Detection | |
Arxiv 24.06.09 | HDMba: Hyperspectral Remote Sensing Imagery Dehazing with State Space Model | Link | Code | Dehazing | |
Arxiv 24.06.11 | DualMamba: A Lightweight Spectral-Spatial Mamba-Convolution Network for Hyperspectral Image Classification | Link | Hyperspectral Image Classification | ||
Arxiv 24.06.16 | PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery | Link | Code | Semantic Segmentation | |
Arxiv 24.07.08 | A Mamba-based Siamese Network for Remote Sensing Change Detection | Link | Code | Change Detection | |
Arxiv 24.07.09 | HTD-Mamba: Efficient Hyperspectral Target Detection with Pyramid State Space Model | Link | Code | Hyperspectral Target Detection | |
Arxiv 24.07.11 | DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing | Link | Code | Oriented Object Detection | |
Arxiv 24.07.11 | GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
TGRS 24.07.19 | MambaHSI: Spatial–Spectral Mamba for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
Arxiv 24.08.01 | Empowering Snapshot Compressive Imaging: Spatial-Spectral State Space Model with Across-Scanning and Local Enhancement | Link | Snapshot Compressive Imaging | ||
Arxiv 24.08.02 | Multi-head Spatial-Spectral Mamba for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
Arxiv 24.08.02 | WaveMamba: Spatial-Spectral Wavelet Mamba for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
Arxiv 24.08.02 | Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
Arxiv 24.08.21 (GRSL 2024) | UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images | Link | Code | Semantic Segmentation | |
Arxiv 24.08.26 | MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification | Link | Code | Hyperspectral Image Classification | |
GRSL 24.09.02 | MambaFormerSR: A Lightweight model for Remote-Sensing Image Super-Resolution | Link | Super-resolution | ||
Arxiv 24.09.05 | UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images | Link | Segmentation | ||
Arxiv 24.09.10 | PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation | Link | Semantic Segmentation | ||
Arxiv 24.09.15 | SITSMamba for Crop Classification based on Satellite Image Time Series | Link | Code | SITS Classification | |
Scientific Reports 24.09.27 | YOLOv5_mamba: unmanned aerial vehicle object detection based on bidirectional dense feedback network and adaptive gate feature fusion | Link | Code | Object Detection | |
Arxiv 24.10.07 | IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification | Link | Hyperspectral Image Classification | ||
Arxiv 24.10.07 (ECML/PKDD 2024 Workshop) | A Deep Learning-Based Approach for Mangrove Monitoring | Link | Code | Segmentation | |
Arxiv 24.10.08 | Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion | Link | Segmentation | ||
Arxiv 24.10.17 | RemoteDet-Mamba: A Hybrid Mamba-CNN Network for Multi-modal Object Detection in Remote Sensing Images | Link | Object Detection | ||
TGRS 24.10.17 | HyperMamba: A Spectral-Spatial Adaptive Mamba for Hyperspectral Image Classification | Link | Code | Hyperspectral Image Classification | |
ACM MM 24.10.28 | VmambaSCI: Dynamic Deep Unfolding Network with Mamba for Compressive Spectral Imaging | Link | Compressive Spectral Imaging | ||
Arxiv 24.11.12 | MaDiNet: Mamba Diffusion Network for SAR Target Detection | Link | Code | SAR Target Detection | |
Arxiv 24.11.12 | CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory | Link | Code | Change Detection | |
TGRS 24.11.11 | ConMamba: CNN and SSM High-Performance Hybrid Network for Remote Sensing Change Detection | Link | Change Detection | ||
TGRS 24.11.18 | A Novel Remote Sensing Image Change Detection Approach Based on Multi-level State Space Model | Link | Code | Change Detection | |
TGRS 24.11.26 | Dynamic Token Augmentation Mamba for Cross-Scene Classification of Hyperspectral Image | Link | Code | Cross-Scene Classification | |
GRSL 24.11.27 | PPMamba:Enhancing Semantic Segmentation in Remote Sensing Imagery by SS2D | Link | Code | Semantic Segmentation |
Medical Image
Date | Paper | Figure | Link | Code | Task |
---|---|---|---|---|---|
Arxiv 24.01.09 | U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation | Link | Code | 2D Medical Segmentation/ </br> 3D Medical Segmentation | |
Arxiv 24.01.24 (MICCAI 2024) | SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation | <img width="635" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/690c8341-1f17-4f8f-929a-d7f31094ad64"> | Link | Code | 3D Medical Segmentation |
Arxiv 24.02.04 | VM-UNet: Vision Mamba UNet for Medical Image Segmentation | <img width="544" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/320dda01-12dc-4e37-992d-8551c99b475a"> | Link | Code | 2D Medical Segmentation |
Arxiv 24.02.05 | nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model | <img width="949" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/2b1669ec-d1f5-4c6c-a743-1620ab83fef3"> | Link | Code | 3D Medical Segmentation |
Arxiv 24.02.05 (MICCAI 2024) | Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining | <img width="711" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/2b1c8b89-2b8c-4273-ae25-833f87fc97c2"> | Link | Code | 2D Medical Segmentation |
Arxiv 24.02.07 | Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation | <img width="698" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/0a09dda6-986b-480d-8445-1db4a02f16f1"> | Link | Code | 2D Medical Segmentation |
Arxiv 24.02.09 | FD-Vision Mamba for Endoscopic Exposure Correction | <img width="666" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/f87cc9c6-efa9-40ca-bf76-af7dd19b2277"> | Link | Code | Endoscopic Exposure Correction |
Arxiv 24.02.11 (KBS 2024) | Semi-Mamba-UNet: Pixel-Level Contrastive Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation | <img width="623" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/e702b599-682b-42b0-8477-a72972843803"> | Link | Code | 2D Medical Segmentation |
Arxiv 24.02.13 | P-Mamba: Marrying Perona Malik Diffusion with Mamba for Efficient Pediatric Echocardiographic Left Ventricular Segmentation | <img width="717" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/cbbc8a01-b1bb-44bf-8954-4485605a8326"> | Link | 2D Medical Segmentation | |
Arxiv 24.02.16 | Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation | <img width="706" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/909947ae-9ba2-47f9-b257-620663d55820"> | Link | Code | 2D Medical Segmentation |
Arxiv 24.02.28 | MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation | <img width="733" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/ea55e5c2-27bb-4155-a1b3-769fbb46c1f3"> | Link | Code | Medical Image Reconstruction/Uncertainty Estimation |
Arxiv 24.03.06 | MedMamba: Vision Mamba for Medical Image Classification | Link | Code | 2D Medical Classification | |
Arxiv 24.03.08 | LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation | <img width="587" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/20237455-6ec6-49a1-81ba-05553c69910d"> | Link | Code | 2D Medical Segmentation/ </br> 3D Medical Segmentation |
Arxiv 24.03.08 (BIBM 2024) | MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models | Link | Cancer Subtyping | ||
Arxiv 24.03.11 (MICCAI 2024) | MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology | <img width="516" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/fc147a4e-8a81-4862-b222-b52929def042"> | Link | Code | Cancer Subtyping/ </br> Survival Prediction |
Arxiv 24.03.12 | Large Window-based Mamba UNet for Medical Image Segmentation: Beyond Convolution and Self-attention | <img width="848" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/2e91f2b6-e13d-48b1-9012-91b8ce5f1f43"> | Link | Code | 2D Medical Segmentation/ </br> 3D Medical Segmentation |
Arxiv 24.03.12 (MICCAI 2024) | LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.03.13 | MD-Dose: A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction | <img width="683" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/4440c5f1-1197-4295-b585-52314a144539"> | Link | Code | Radiation Dose Prediction (Segmentation) |
Arxiv 24.03.14 (ISBRA 2024) | VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation | <img width="702" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/3f285231-30db-4737-a790-e69f0646d155"> | Link | Code | 2D Medical Segmentation |
Arxiv 24.03.20 | H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation | <img width="748" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/88ae6463-46e9-4c84-a658-160bbbf4d9cf"> | Link | Code | 2D Medical Segmentation |
Arxiv 24.03.20 | ProMamba: Prompt-Mamba for polyp segmentation | <img width="741" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/56106afc-1d80-42db-bb9f-a6daaed7abc8"> | Link | 2D Medical Segmentation | |
Arxiv 24.03.25 | CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification | <img width="707" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/c71f93de-e730-40b2-bb23-74a07c868ab7"> | Link | Alzheimer’s disease Classification (CT/MRI) | |
Arxiv 24.03.26 | Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion | <img width="622" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/c0b134c5-f7e4-4794-86e2-b20ddca84469"> | Link | 2D Medical Segmentation (2D MRI) | |
Arxiv 24.03.26 | Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models | <img width="633" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/5d4acab4-a1bb-4563-8551-151295f08bf2"> | Link | Image Resotration | |
Arxiv 24.03.26 | Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation | <img width="830" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/100bc4e7-65e1-43d9-815d-99b394e12b4f"> | Link | 2D Medical Segmentation | |
Arxiv 24.03.29 | UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation | <img width="725" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/14d04b8b-cbe6-429d-984d-3ac7dd894bf3"> | Link | Code | 2D Medical Segmentation |
Arxiv 24.04.01 | T-Mamba: Frequency-Enhanced Gated Long-Range Dependency for Tooth 3D CBCT Segmentation | <img width="603" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/b01c38ef-6623-4e9b-867b-ba6f39575b5c"> | Link | Code | 3D Medical Segmentation (Tooth) |
Arxiv 24.04.10 (MIDL 2024) | ViM-UNet: Vision Mamba for Biomedical Segmentation | <img width="581" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/1c32deb4-7695-4cb6-bbc7-5912a69bed98"> | Link | Code | 2D Medical Segmentation (Cell/Neurite) |
Arxiv 24.04.15 (MICCAI 2024) | nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation | Link | Code | 3D Medical Segmentation | |
Arxiv 24.04.19 (CVPR 2024 Workshop) | Vim4Path: Self-Supervised Vision Mamba for Histopathology Images | <img width="939" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/9e5d4ef7-89f5-47da-bf5e-38367997f54f"> | Link | Code | Cancer Subtyping |
Arxiv 24.04.26 | Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment | Link | Universal Lesion Segmentation | ||
Arxiv 24.04.26 | Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model | Link | ODT Sparse Reconstruction | ||
Arxiv 24.05.05 | AC-MAMBASEG: An adaptive convolution and Mamba-based architecture for enhanced skin lesion segmentation | Link | Code | Skin Lesion Segmentation | |
Arxiv 24.05.08 | HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation | <img width="689" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/dd95c6d9-9d34-4460-9936-e6de2971dab8"> | Link | 2D Medical Segmentation | |
Arxiv 24.05.09 | VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis | <img width="724" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/fe63dac2-8500-48cb-8237-2b0c02f5be38"> | Link | Medical Image Generation | |
Arxiv 24.05.24 | MUCM-Net: A Mamba Powered UCM-Net for Skin Lesion Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.05.25 | UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation | Link Link | Medical Image Segmentation | ||
Arxiv 24.05.27 | TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction | Link | Code | Pre-training/Medical Image Segmentation | |
Arxiv 24.05.27 | Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba | Link | Medical Image Reconstruction | ||
Arxiv 24.05.28 (MICCAI 2024 Oral) | Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-Mamba | Link | Code | CVD Risk Prediction | |
Arxiv 24.06.01 | SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation | <img width="602" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/b5d17a68-7e29-4012-9d2e-449bba21745d"> | Link | Medical Image Segmentation | |
Arxiv 24.06.05 | Combining Graph Neural Network and Mamba to Capture Local and Global Tissue Spatial Relationships in Whole Slide Images | Link | Code | Cancer Subtyping/Survival Prediction | |
Arxiv 24.06.09 | Vision Mamba: Cutting-Edge Classification of Alzheimer's Disease with 3D MRI Scans | Link | 3D Medical Classification | ||
Arxiv 24.06.09 (WACV 2025) | Convolution and Attention-Free Mamba-based Cardiac Image Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.06.10 | MHS-VM: Multi-Head Scanning in Parallel Subspaces for Vision Mamba | Link | Code | Medical Image Segmentation | |
Arxiv 24.06.12 (BMVC 2024) | On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models | Link | Code | Medical Image Segmentation | |
Arxiv 24.06.22 | Soft Masked Mamba Diffusion Model for CT to MRI Conversion | Link | Code | CT to MRI Conversion | |
Arxiv 24.07.04 (MICCAI 2024 Workshop) | Vision Mamba for Classification of Breast Ultrasound Images | Link | Classification | ||
Arxiv 24.07.08 (MICCAI 2024) | Deform-Mamba Network for MRI Super-Resolution | Link | Super-resolution | ||
Arxiv 24.07.08 | Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution | Link | Super-resolution | ||
Arxiv 24.07.11 | SR-Mamba: Effective Surgical Phase Recognition with State Space Model | Link | Code | Surgical Phase Recognition | |
Arxiv 24.07.11 | SliceMamba for Medical Image Segmentation | Link | Medical Image Segmentation | ||
Arxiv 24.08.14 | Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark | Link | Medical Image Segmentation | ||
Arxiv 24.08.15 | MambaMIM: Pre-training Mamba with State Space Token-interpolation | Link | Code | Medical Image Segmentation | |
Arxiv 24.08.21 | HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.08.23 | Hierarchical Spatio-Temporal State-Space Modeling for fMRI Analysis | Link | Medical Image Classification and Regression | ||
Arxiv 24.08.25 | MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation | Link | Code | Medical Image Segmentation | |
KDD Workshop 24.08.25 | State Space Model-based Classification of Major Depressive Disorder Across Multiple Imaging Sites | Link | Medical Image Classification | ||
Arxiv 24.08.26 (MICCAI 2024) | ShapeMamba-EM: Fine-Tuning Foundation Model with Local Shape Descriptors and Mamba Blocks for 3D EM Image Segmentation | Link | Medical Image Segmentation | ||
Arxiv 24.08.26 | LoG-VMamba: Local-Global Vision Mamba for Medical Image Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.08.28 | SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through Residual Visual Mamba Layers and Shape Priors | Link | Medical Image Segmentation | ||
Scientific Reports 24.08.28 | A mixed Mamba U-net for prostate segmentation in MR images | Link | Medical Image Segmentation | ||
Arxiv 24.09.06 | MpoxMamba: A Grouped Mamba-based Lightweight Hybrid Network for Mpox Detection | Link | Code | Medical Image Classification | |
Arxiv 24.09.06 | Serp-Mamba: Advancing High-Resolution Retinal Vessel Segmentation with Selective State-Space Model | Link | Medical Image Segmentation | ||
Arxiv 24.09.09 | SX-Stitch: An Efficient VMS-UNet Based Framework for Intraoperative Scoliosis X-Ray Image Stitching | Link | Medical Image Stitching | ||
Arxiv 24.09.12 | Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters | Link | Code | Medical Image Classification | |
Arxiv 24.09.12 | OCTAMamba: A State-Space Model Approach for Precision OCTA Vasculature Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.09.12 | MedSegMamba: 3D CNN-Mamba Hybrid Architecture for Brain Segmentation | Link | Medical Image Segmentation | ||
Arxiv 24.09.13 (MICCAI 2024) | Tri-Plane Mamba: Efficiently Adapting Segment Anything Model for 3D Medical Images | Link | Code | Medical Image Segmentation | |
Arxiv 24.09.17 (ACCV 2024 Workshop) | SkinMamba: A Precision Skin Lesion Segmentation Architecture with Cross-Scale Global State Modeling and Frequency Boundary Guidance | Link | Code | Medical Image Segmentation | |
Arxiv 24.09.18 | SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba | Link | Surgical Phase Recognition | ||
Arxiv 24.09.19 | MambaRecon: MRI Reconstruction with Structured State Space Models | Link | Code | Medical Image Reconstruction | |
Arxiv 24.09.19 | MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.09.24 | Segmentation Strategies in Deep Learning for Prostate Cancer Diagnosis: A Comparative Study of Mamba, SAM, and YOLO | Link | Code | Medical Image Segmentation | |
Arxiv 24.09.25 | Classification of Gleason Grading in Prostate Cancer Histopathology Images Using Deep Learning Techniques: YOLO, Vision Transformers, and Vision Mamba | Link | Code | Medical Image Classification | |
Arxiv 24.09.26 (MICCAI 2024) | EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.09.28 | MambaEviScrib: Mamba and Evidence-Guided Consistency Make CNN Work Robustly for Scribble-Based Weakly Supervised Ultrasound Image Segmentation | Link | Code | Medical Image Segmentation | |
MICCAI 24.10.03 | MetaUNETR: Rethinking Token Mixer Encoding for Efficient Multi-organ Segmentation | Link | Code | Medical Image Segmentation | |
MICCAI 24.10.06 | PathMamba: Weakly Supervised State Space Model for Multi-class Segmentation of Pathology Images | Link | Code | Medical Image Segmentation | |
MICCAI 24.10.06 | Efficient and Gender-adaptive Graph Vision Mamba for Pediatric Bone Age Assessment | Link | Code | Bone Age Assessment | |
MICCAI 24.10.06 | Polyp-Mamba: Polyp Segmentation with Visual Mamba | Link | Medical Image Segmentation | ||
TMI 24.10.07 | Unleash the Power of State Space Model for Whole Slide Image with Local Aware Scanning and Importance Resampling | Link | Code | Cancer Subtyping/Survival Prediction | |
Arxiv 24.10.20 | Taming Mambas for Voxel Level 3D Medical Image Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.10.29 | Advancing Efficient Brain Tumor Multi-Class Classification -- New Insights from the Vision Mamba Model in Transfer Learning | Link | Multi-Class Classification | ||
Arxiv 24.10.31 | MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.11.12 | CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising | Link | Denoising | ||
Arxiv 24.11.13 | MambaXCTrack: Mamba-based Tracker with SSM Cross-correlation and Motion Prompt for Ultrasound Needle Tracking | Link | Ultrasound Needle Tracking | ||
Arxiv 24.11.14 | When Mamba Meets xLSTM: An Efficient and Precise Method with the XLSTM-VMUNet Model for Skin lesion Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.11.18 | KAN-Mamba FusionNet: Redefining Medical Image Segmentation with Non-Linear Modeling | Link | Medical Image Segmentation | ||
Arxiv 24.11.20 | Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning | Link | Medical Representation Learning | ||
TMI 24.11.28 | Swin-UMamba+: Adapting Mamba-based vision foundation models for medical image segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.12.01 | 2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification | Link | Code | Medical Image Classification | |
Arxiv 24.12.02 | MambaU-Lite: A Lightweight Model based on Mamba and Integrated Channel-Spatial Attention for Skin Lesion Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.12.03 | Segmentation of Coronary Artery Stenosis in X-ray Angiography using Mamba Models | Link | Medical Image Segmentation | ||
Arxiv 24.12.11 | SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation | Link | Code | Medical Image Segmentation | |
Arxiv 24.12.19 (AAAI 2025) | S3Mamba: Small-Size-Sensitive Mamba for Lesion Segmentation | Link | Code | Medical Image Segmentation | |
Information Fusion 25.03 | Polyp-Mamba: A Hybrid Multi-Frequency Perception Gated Selection Network for polyp segmentation | Link | Medical Image Segmentation |
Video
Date | Paper | Figure | Link | Code | Task |
---|---|---|---|---|---|
Arxiv 24.01.25 | Vivim: a Video Vision Mamba for Medical Video Object Segmentation | <img width="596" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/e30c0ceb-5399-44b5-99b7-65ada043c87c"> | Link | Code | Medical Video Segmentation |
Arxiv 24.03.11 (ECCV 2024) | VideoMamba: State Space Model for Efficient Video Understanding | <img width="728" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/08797465-f93f-49ce-b724-91b67fabbabd"> | Link | Code | Action Recognition/Video Understanding/Text-to-video Retrieval |
Arxiv 24.03.12 (ICLR 2024 Workshop) | SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces | <img width="655" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/1b1ce7b5-392c-46dd-b4e2-d6e03f6af1ab"> | Link | Code | Video Generation |
Arxiv 24.03.14 | Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | <img width="704" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/70fb7829-d28e-4bbc-b326-fcb167dad979"> | Link | Code | Action Recognition/Action Localization/... |
Arxiv 24.03.25 (CVPR 2024 Workshop) | VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting | Link | Code | Spatiotemporal Forecasting | |
Arxiv 24.04.09 | RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos | <img width="881" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/f1b0f8a1-f10f-43c6-8203-701ae0376af2"> | Link | Code | Remote photoplethysmography Prediction |
Arxiv 24.04.11 | Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos | <img width="697" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/ea35cf6a-e2a6-4eab-8da7-2cb7cd098507"> | Link | Skeleton Action Recognition | |
Arxiv 24.05.05 | Matten: Video Generation with Mamba-Attention | Link | Video Generation | ||
Arxiv 24.05.30 | DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | Link | Code | AI-Generated Video Detection | |
Arxiv 24.06.18 | Slot State Space Models | Link | Object-centric Video Understanding/3D Visual Reasoning/Video Prediction | ||
Arxiv 24.06.27 | VideoMambaPro: A Leap Forward for Mamba in Video Understanding | Link | Code | Video Understanding | |
Arxiv 24.07.02 (NeurIPS 2024) | VFIMamba: Video Frame Interpolation with State Space Models | Link | Code | Video Frame Interpolation | |
Arxiv 24.07.03 | BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement | Link | Code | Low-Light Video Enhancement | |
Arxiv 24.07.04 | QueryMamba: A Mamba-Based Encoder-Decoder Architecture with a Statistical Verb-Noun Interaction Module for Video Action Forecasting @ Ego4D Long-Term Action Anticipation Challenge 2024 | Link | Video Action Forecasting | ||
Arxiv 24.07.11 (ECCV 2024) | VideoMamba: Spatio-Temporal Selective State Space Model | Link | Code | Action Recognition | |
Arxiv 24.07.25 | Harnessing Temporal Causality for Advanced Temporal Action Detection | Link | Code | Moment Queries/Action Recognition/Action Detection/Audio-Based Interaction Detection | |
Arxiv 24.07.31 (ACM MM 2024 Oral) | RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining | Link | Code | Deraining | |
Arxiv 24.08.15 | MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking | Link | RGB-T Tracking | ||
Arxiv 24.08.17 (ACM MM 2024 Oral) | MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model | Link | Multiple Object Tracking | ||
Arxiv 24.08.20 | DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba | Link | Video Demoireing | ||
Arxiv 24.08.31 | TrackSSM: A General Motion Predictor by State-Space Model | Link | Motion Prediction | ||
Arxiv 24.09.02 | FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking | Link | Fish Tracking | ||
Arxiv 24.09.04 | MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos | Link | Code | Hand Trajectory Prediction | |
JSTAR 24.09.11 | TrackingMamba: Visual State Space Model for Object Tracking | Link | Code | Object Tracking | |
Arxiv 24.09.18 (CCBR 2024) | PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba | Link | Code | Remote Photoplethysmography | |
NeurIPS 24.09.26 | MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging | Link | Code | Snapshot Compressive Imaging | |
NeurIPS 24.09.26 | Toward Dynamic Non-Line-of-Sight Imaging with Mamba Enforced Temporal Consistency | Link | Dynamic Reconstruction | ||
Arxiv 24.10.18 | MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging | Link | Code | Video Snapshot Compressive Imaging | |
ACM MM 24.10.28 | Object-Level Pseudo-3D Lifting for Distance-Aware Tracking | Link | Tracking | ||
Arxiv 24.11.03 | Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation | Link | Code | Medical Video Generation | |
Arxiv 24.11.23 | MUFM: A Mamba-Enhanced Feedback Model for Micro Video Popularity Prediction | Link | Popularity Prediction | ||
Arxiv 24.11.29 | Look Every Frame All at Once: Video-Ma2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing | Link | Code | Video Understanding | |
Arxiv 24.12.11 (AAAI 2025) | Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence | Link | Code | Action Recognition | |
Arxiv 24.12.15 (AAAI 2025) | Exploring Enhanced Contextual Information for Video-Level Object Tracking | Link | Code | Tracking | |
Arxiv 24.12.18 (AAAI 2025) | Robust Tracking via Mamba-based Context-aware Token Learning | Link | Code | Tracking | |
Arxiv 24.12.18 | MambaLCT: Boosting Tracking via Long-term Context State Space Model | Link | Code | Tracking | |
Arxiv 24.12.19 (AAAI 2025) | Efficient Self-Supervised Video Hashing with Selective State Spaces | Link | Code | Hashing |
Point Cloud
Date | Paper | Figure | Link | Code | Task |
---|---|---|---|---|---|
Arxiv 24.02.16 (NeurIPS 2024) | PointMamba: A Simple State Space Model for Point Cloud Analysis | <img width="718" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/e252e787-0189-4f94-bea1-2944d50b18f4"> | Link | Code | Classification, Part Segmentation |
Arxiv 24.02.23 (CVPR 2024 Spotlight, SSM) | State Space Models for Event Cameras | Link | Code | Object Detection | |
Arxiv 24.03.01 | Point Cloud Mamba: Point Cloud Learning via State Space Model | <img width="692" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/6a315d04-afe6-41d1-b8d5-d931a891a681"> | Link | Code | Classification, Part Segmentation, Semantic Segmentation |
Arxiv 24.03.11 | Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy | <img width="882" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/c1c7c020-28cc-4ca6-b271-1d3cf665243f"> | Link | Code | Classification, Semantic Segmentation |
Arxiv 24.04.08 | 3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering | <img width="1028" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/ab137b17-85c1-4b6c-96d4-9ae5bfd45a1b"> | Link | Point Cloud Filtering | |
Arxiv 24.04.10 | 3DMambaComplete: Exploring Structured State Space Model for Point Cloud Completion | <img width="1020" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/da19fb01-52bd-4a55-b0ca-9681fdaef9ed"> | Link | Point Cloud Completion | |
Arxiv 24.04.19 (ACM MM 2024) | MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model | Link | Code | Object Segmentation | |
Arxiv 24.04.23 (ACM MM 2024) | Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model | <img width="959" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/6b565138-1c2d-4201-bd34-8b4343a62ec9"> | Link | Code | Classification, Part Segmentation |
Arxiv 24.05.09 | Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba | <img width="1528" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/57466105/2a7422a9-9483-4b5c-be57-5b2c04f4b614"> | Link | Classification, Regression | |
Arxiv 24.05.13 | OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition | Link | Code | LiDAR Place Recognition | |
Arxiv 24.05.23 | MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models | Link | Point Cloud Video Understanding | ||
Arxiv 24.05.24 | PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis | Link | Code | Classification, Part Segmentation | |
Arxiv 24.05.27 (NeurIPS 2024) | LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling | Link | Classification, Part Segmentation, Object Detection | ||
Arxiv 24.06.07 | Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs | Link | Code | Generation | |
Arxiv 24.06.10 | PointABM: Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis | Link | Classification | ||
Arxiv 24.06.15 (NeurIPS 2024) | Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection | Link | Code | Object Detection | |
Arxiv 24.06.25 | Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model | Link | Semantic Segmentation | ||
Arxiv 24.07.15 | Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model | Link | Semantic Segmentation, Instance Segmentation | ||
Arxiv 24.07.25 | LION: Linear Group RNN for 3D Object Detection in Point Clouds | Link | Code | Object Detection | |
Arxiv 24.08.19 | Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms | Link | Code | Action Recognition | |
Arxiv 24.08.20 | MambaEVT: Event Stream based Visual Object Tracking using State Space Model | Link | Code | Object Tracking | |
Arxiv 24.08.20 | MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation | Link | Code | Object Segmentation | |
Arxiv 24.08.20 | OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model | Link | Code | Semantic Prediction/Scene Completion | |
Arxiv 24.09.17 | Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation | Link | Object Detection | ||
Arxiv 24.09.24 | FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving | Link | 4D Occupancy Forecasting | ||
NeurIPS 24.09.26 | 3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection | Link | Object Detection | ||
Arxiv 24.10.21 | MBPU: A Plug-and-Play State Space Model for Point Cloud Upsamping with Fast Point Rendering | Link | Upsamping | ||
Arxiv 24.10.22 | SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition | Link | Code | Action Recognition | |
Arxiv 24.10.24 | Bio2Token: All-atom tokenization of any biomolecular structure with Mamba | Link | Tokenization | ||
ICIP 24.10.27 | Mamba-PCGC: Mamba-Based Point Cloud Geometry Compression | Link | Geometry Compression | ||
Arxiv 24.10.28 | Exploring contextual modeling with linear complexity for point cloud segmentation | Link | Semantic Segmentation | ||
Arxiv 24.10.31 | NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs | Link | Classification, Part Segmentation | ||
Arxiv 24.11.06 | Towards 3D Semantic Scene Completion for Autonomous Driving: A Meta-Learning Framework Empowered by Deformable Large-Kernel Attention and Mamba Model | Link | Semantic Scene Completion | ||
Arxiv 24.11.19 | STREAM: A Universal State-Space Model for Sparse Geometric Data | Link | Classification |
Multi-Modal
Date | Paper | Figure | Link | Code | Task | Modality |
---|---|---|---|---|---|---|
Arxiv 24.01.25 | MambaMorph: a Mamba-based Framework for Medical MR-CT Deformable Registration | <img width="705" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/0584bfee-1ed2-4d5b-984e-c374491adab9"> | Link | Code | Registration | MRI & CT |
Arxiv 24.02.19 | Pan-Mamba: Effective pan-sharpening with State Space Model | <img width="716" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/9cead6ad-ce09-4597-a985-8181b407523d"> | Link | Code | Pansharpening | HISR Images & LRMS Images |
Arxiv 24.03.07 (ECCV 2024) | InstructGIE: Towards Generalizable Image Editing | <img width="912" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/45b0c86f-f473-4eb7-a821-7be8e3be417d"> | Link | Code | Image Editing | Image & Text |
Arxiv 24.03.12 (ECCV 2024) | Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM | <img width="910" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/9ef9c705-657d-4b6d-b229-6e2e4270682f"> | Link | Code | Text-to-Motion Generation | Motion & Text |
Arxiv 24.03.14 (NeurIPS 2024) | MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models | Link | Gesture Synthesis | |||
Arxiv 24.03.20 | VL-Mamba: Exploring State Space Models for Multimodal Learning | <img width="718" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/aa912eb8-13a7-488f-9601-d298ed6796e2"> | Link | Code | MLLM tasks | Image & Text |
Arxiv 24.03.21 | Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference | <img width="626" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/df845d03-3739-4b78-9328-6c2df2e98aad"> | Link | Code | MLLM tasks | Image & Text |
Arxiv 24.03.26 (ECCV 2024) | ReMamber: Referring Image Segmentation with Mamba Twister | <img width="715" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/08c3e6e4-49ca-4081-bea6-ed4c7b046c0b"> | Link | Code | Referring Image Segmentation | Image & Text |
Arxiv 24.04.01 | SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding | <img width="727" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/04cba5ac-b2f0-4357-b447-1e14a1d2617b"> | Link | Temporal Video Grounding | Video & Text | |
Arxiv 24.04.05 (WACV 2025) | Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation | <img width="702" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/a972740b-774c-4e14-a914-791aa5f519b8"> | Link | Code | Semantic Segmentation | RGB Images & Depth/Thermal Images |
Arxiv 24.04.07 | VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module | <img width="711" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/6d47fc18-f044-49ed-8724-941a4fe46ebc"> | Link | Code | Registration | MRI & CT |
Arxiv 24.04.11 | SurvMamba: State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction | <img width="813" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/c9307069-4672-47ed-9706-1003a5ad5eff"> | Link | Cancer Subtyping/Survival Prediction | WSIs & Gene | |
Arxiv 24.04.11 (TGRS 2024) | Efficient Remote Sensing Image Fusion With State Space Model | <img width="816" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/2182cfb2-fa6f-4dea-ab2b-d21b906a683f"> | Link | Code | Pansharpening | HISR Images & LRMS Images |
Arxiv 24.04.12 | MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion | <img width="1035" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/5921c2d4-d50d-48a3-928a-1dc69a60deb6"> | Link | Multi-modality Image Fusion | RGB & Thermal Images, MRI & CT/PET/SPECT | |
Arxiv 24.04.14 | Fusion-Mamba for Cross-modality Object Detection | <img width="902" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/97b716a5-f647-43d9-a1fe-dc8b2b02670d"> | Link | Visible-infrared Images Fusion | RGB Images & Infrared Images | |
Arxiv 24.04.14 | A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion | <img width="1013" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/df942cab-7802-4314-b7a7-549439b74f06"> | Link | Pansharpening | HISR Images & LRMS Images | |
Arxiv 24.04.15 | FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba | <img width="906" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/87093676-286c-4e35-89b7-7e573679cc67"> | Link | Code | Image Fusion | RGB & Infrared Images, MRI & CT/PET/SPECT, PC & GFP |
Arxiv 24.04.17 | Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion | <img width="810" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/f27aa176-65e2-44db-a172-56712e789729"> | Link | Temporal Grounding | Motion & Text | |
Arxiv 24.04.25 | CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions | Link | Code | Visible-infrared Images Fusion | RGB Images & Infrared Images | |
Arxiv 24.04.27 | Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion | Link | Multi-modal Emotion Recognition | Text & Video & Audio | ||
Arxiv 24.04.28 (PRCV 2024) | Mamba-FETrack: Frame-Event Tracking via State Space Model | Link | Code | RGB-Event Tracking | RGB Frames & Event | |
Arxiv 24.04.29 (GRSL 2024) | RSCaMa: Remote Sensing Image Change Captioning with State Space Model | Link | Code | Image Captioning | Remote Sensing Image & Text | |
Arxiv 24.04.30 | CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation | Link | Code | OOD | Image & Text | |
Arxiv 24.05.22 | I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling | Link | Code | Medical Image Generation | MRI/CT | |
Arxiv 24.05.24 (NeurIPS 2024) | Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | Link | Code | Large Language and Vision Model | Image & Text (Qestion/Rationale) | |
Arxiv 24.05.29 (NeurIPS 2024) | Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model | Link | multi-modal sentiment analysis | Text & Video & Audio | ||
Arxiv 24.05.31 | S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image Fusion | <img width="539" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/74030174/15ab4235-93bd-4e6d-b836-79142bfa84ec"> | Link | Image Fusion | RGB Images & Infrared Images | |
Arxiv 24.06.02 | MGI: Multimodal Contrastive Pre-training of Genomic and Medical Imaging | Link | Multimodal Contrastive Pre-training | Medical Image & Genomic | ||
Arxiv 24.06.03 | Dimba: Transformer-Mamba Diffusion Models | Link | Code | Text to Image Generation | Image & Text | |
Arxiv 24.06.06 (NeurIPS 2024) | RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation | Link | Code | Robot Reasoning and Manipulation | Image & Text | |
Arxiv 24.06.10 | MVGamba: Unify 3D Content Generation as State Space Sequence Modeling | Link | 3D Generation | Image & Text | ||
Arxiv 24.07.02 | MMR-Mamba: Multi-Contrast MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion | Link | Image Fusion | Multi-Contrast MRI | ||
Arxiv 24.07.14 | InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation | Link | Code | Text-to-Motion Generation | Motion & Text | |
Arxiv 24.07.15 | An Empirical Study of Mamba-based Pedestrian Attribute Recognition | Link | Code | Pedestrian Attribute Recognition | Image & Text | |
Arxiv 24.07.15 | OPa-Ma: Text Guided Mamba for 360-degree Image Out-painting | Link | 360-degree Image Out-painting | Image & Text | ||
Arxiv 24.07.22 | GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI | Link | Code | AD Progression Assessment | MRI & PET | |
Arxiv 24.07.29 | ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 | Link | Code | MLLM Tasks | Image & Text | |
Arxiv 24.07.29 (ACM MM 2024) | MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion | Link | Co-Speech Gesture Generation | Motion & Audio | ||
Arxiv 24.08.01 | DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework | Link | Code | Co-Speech Gesture Generation | Motion & Audio | |
Arxiv 24.08.02 (ITSC 2024) | MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for Efficient Pedestrian Detection | Link | Code | Pedestrian Detection | RGB & Thermal Images | |
Arxiv 24.08.02 | PhysMamba: Leveraging Dual-Stream Cross-Attention SSD for Remote Physiological Measurement | Link | Remote Physiological Measurement | Video & rPPG | ||
Arxiv 24.08.03 | JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Language Model | Link | Motion & Audio | |||
Arxiv 24.08.07 | DRAMA: An Efficient End-to-end Motion Planner for Autonomous Driving with Mamba | Link | Driver Motion Plan | Image & Text | ||
Arxiv 24.08.15 | ColorMamba: Towards High-quality NIR-to-RGB Spectral Translation with Mamba | Link | Code | NIR-to-RGB Translation | NIR Images & RGB Images | |
Arxiv 24.08.16 | RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba | Link | RGBT Tracking | RGB Videos & TIR Videos | ||
Arxiv 24.08.19 | R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation | Link | Code | Medical Report Generation | Image & Text | |
Arxiv 24.08.19 | OccMamba: Semantic Occupancy Prediction with State Space Models | Link | Semantic Occupancy Prediction | LiDAR Points & RGB Images | ||
Arxiv 24.08.20 | Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm | Link | Code | Event Stream based Sign Language Translation | Event & Text | |
Arxiv 24.08.20 | MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval | Link | Code | Text-video Retrieval | Video & Text | |
Arxiv 24.08.22 | Adapt CLIP as Aggregation Instructor for Image Dehazing | Link | Dehazing | Image & Text | ||
Arxiv 24.08.27 | DualKanbaFormer: Kolmogorov-Arnold Networks and State Space Model DualKanbaFormer: Kolmogorov-Arnold Networks and State Space Model Transformer for Multimodal Aspect-based Sentiment Analysis | Link | Multi-modal Sentiment Analysis | Image & Text | ||
Arxiv 24.08.28 | MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms | Link | Code | Cross-Modal Place Recognition | Point Cloud & Text | |
TGRS 24.08.30 | Mask-Guided Mamba Fusion for Drone-based Visible-Infrared Vehicle Detection | Link | Cross-Modal Detection | RGB Images & Infrared Image | ||
Arxiv 24.09.03 | PixelBytes: Catching Unified Embedding for Multimodal Generation | Link | Code | Multi-Modal Generation | Image & Text | |
Arxiv 24.09.03 | Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion | Link | Multi-Modality Image Fusion | HISR Images & LRMS Images, MRI & CT/PET/SPECT | ||
Arxiv 24.09.04 | LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | Link | Code | MLLM Tasks | Image & Text | |
Arxiv 24.09.05 | Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion | Link | Multi-Modality Image Fusion | RGB & Thermal Images, MRI & CT/PET/SPECT | ||
Arxiv 24.09.08 | Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations | Link | Code | Multi-modal Emotion Recognition | Text & Audio & Video | |
Arxiv 24.09.09 | Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling | Link | Code | MLLM Tasks | Image & Text | |
Arxiv 24.09.11 | Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models | Link | Code | Action Prediction | Point Cloud & Robot State | |
TGRS 24.09.12 | Joint Classification of Hyperspectral and LiDAR Data Based on Mamba | Link | Code | Classification | HSI Images & LiDAR Points | |
Arxiv 24.09.13 | Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection | Link | Code | Open-Vocabulary Detection | Image & Text | |
Arxiv 24.09.17 | Mamba Fusion: Learning Actions Through Questioning | Link | Code | Action Prediction/Action Anticipation | Video & Text | |
Arxiv 24.09.22 | GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning | Link | Code | Grasp Detection | Image & Text | |
Arxiv 24.09.24 | DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection | Link | Code | Multi-modal Depression Detection | Video & Audio | |
Arxiv 24.09.30 | MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation | Link | Image Generation | Image & Text | ||
Arxiv 24.10.01 | CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset | Link | Code | Medical Report Generation | Image & Text | |
Arxiv 24.10.04 | HMT-Grasp: A Hybrid Mamba-Transformer Approach for Robot Grasping in Cluttered Environments | Link | Robot Grasping | RGB-D Image & Grasp | ||
MICCAI 24.10.07 | LM-UNet: Whole-Body PET-CT Lesion Segmentation with Dual-Modality-Based Annotations Driven by Latent Mamba U-Net | Link | Code | Medical Image Segmentation | PET & CT | |
Arxiv 24.10.08 | EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment | Link | MLLM tasks | Image & Text | ||
Arxiv 24.10.10 | Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation | Link | Style-Specific Chinese Calligraphy Generation | Image & Text | ||
Arxiv 24.10.17 | RemoteDet-Mamba: A Hybrid Mamba-CNN Network for Multi-modal Object Detection in Remote Sensing Images | Link | Object Detection | RGB Images & Infrared Images | ||
Arxiv 24.10.19 | MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient Object Detection | Link | Code | Salient Object Detection | RGB Images & Depth Images | |
Arxiv 24.10.21 | LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset | Link | Dehazing | Image & Text | ||
Arxiv 24.10.21 (ISBI 2025) | R2Gen-Mamba: A Selective State Space Model for Radiology Report Generation | Link | Code | Radiology Report Generation | Image & Text | |
GRSL 24.10.23 | A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation | Link | Code | Semantic Segmentation | VIS Images & SAR Images | |
TIV 24.10.24 | SeqMamba-MPR: A Spatial-Temporal Mamba Network for Place Recognition Using Sequential Multi-Modal Data | Link | Place Recognition | LiDAR points & RGB Images | ||
GRSL 24.10.30 | S2CrossMamba: Spatial–Spectral Cross-Mamba for Multimodal Remote Sensing Image Classification | Link | Code | Classification | HISR Images & LRMS Images | |
Arxiv 24.11.03 | MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration | Link | Registration | RGB Images & Infrared Images | ||
Arxiv 24.11.10 | KMM: Key Frame Mask Mamba for Extended Motion Generation | Link | Code | Text-to-Motion Generation | Motion & Text | |
Arxiv 24.11.13 | Multimodal Instruction Tuning with Hybrid State Space Models | Link | Zero-shot Multimodal/VQA | Image/Video & Text | ||
Arxiv 24.11.18 | RAWMamba: Unified sRGB-to-RAW De-rendering With State Space Model | Link | De-rendering | RGB Images & RAW Images | ||
Arxiv 24.11.23 | MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking | Link | Vision-Language Tracking | Image & Text | ||
Arxiv 24.11.23 | DiM-Gestor: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 | Link | Code | Co-Speech Gesture Generation | Motion & Audio | |
Scientific Reports 24.11.26 | ReMamba: a hybrid CNN-Mamba aggregation network for visible-infrared person re-identification | Link | Visible-infrared Re-identification | RGB Images & Infrared Images | ||
TCSVT 24.11.28 | MDNet: Mamba-Effective Diffusion-Distillation Network for RGB-Thermal Urban Dense Prediction | Link | Code | Dense Prediction | RBG Images & Thermal Images | |
Arxiv 24.12.01 | AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment | Link | Multi-Modality Fusion | Video & Text & Audio | ||
Arxiv 24.12.09 | MSCrackMamba: Leveraging Vision Mamba for Crack Detection in Fused Multispectral Imagery | Link | Crack Detection | RGB Images & Infrared Images | ||
Arxiv 24.12.11 (AAAI 2025) | LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba | Link | Occupancy Prediction | Image & Text | ||
Arxiv 24.12.13 | LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity | Link | Text-to-Video Generation | Video & Text | ||
Arxiv 24.12.13 | Selective State Space Memory for Large Vision-Language Models | Link | MLLM Tasks | Image & Text | ||
Arxiv 24.12.14 (AAAI 2025) | MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt | Link | Code | Object Re-Identification | RGB Images & Near Infrared Images & Thermal Infrared Images | |
Arxiv 24.12.15 | OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation | Link | 3D Scene Generation | Images/Videos & Occupancy Grids | ||
Arxiv 24.12.15 (AAAI 2025) | Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation | Link | Code | Text-to-motion Generation | Motion & Text | |
Information Fusion 25.03 | An efficient cross-view image fusion method based on selected state space and hashing for promoting urban perception | Link | Cross-view Geolocation | Street-view Images & Aerial-view Images |
Others
Date | Paper | Figure | Link | Code | Task |
---|---|---|---|---|---|
Arxiv 24.02.24 | Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning | <img width="683" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/e8ed3e23-e305-4b8a-a706-0601c1ef3b1b"> | Link | Code | Food Classification |
Arxiv 24.03.08 | Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy | <img width="943" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/4cce4533-8d35-4acc-8cb3-6ad44603dc04"> | Link | Code | Endoscope Tip Tracking |
Arxiv 24.03.22 | Music to Dance as Language Translation using Sequence Models | <img width="541" alt="image" src="https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models/assets/88369000/3e647680-22c7-4885-9ada-f32d9288f1ba"> | Link | Code | Music-to-Dance |
Valuable Insights
Date | Paper | Link |
---|---|---|
Arxiv 24.03.03 | The Hidden Attention of Mamba Models | Link |
Arxiv 24.03.15 | On the low-shot transferability of [V]-Mamba? | Link |
Arxiv 24.03.16 | Understanding Robustness of Visual State Space Models for Image Classification | Link |
Arxiv 24.05.13 | MambaOut: Do We Really Need Mamba for Vision? | Link |
Arxiv 24.05.26 (NeurIPS 2024) | Demystify Mamba in Vision: A Linear Attention Perspective | Link |
Arxiv 24.05.26 | A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models | Link |
Arxiv 24.06.11 (NeurIPS 2024) | MambaLRP: Explaining Selective State Space Sequence Models | Link |
Arxiv 24.06.13 | Towards Evaluating the Robustness of Visual State Space Models | Link |
Other Domains
Reinforcement Learning
Date | Paper | Figure | Link | Code |
---|---|---|---|---|
Arxiv 24.03.25 (IROS 2024) | Proprioception Is All You Need: Terrain Classification for Boreal Forests | Link | Code | |
Arxiv 24.05.20 (NeurIPS 2024) | Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? | Link | ||
Arxiv 24.05.31 | Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling | Link | ||
Arxiv 24.06.04 | Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning | Link | Code | |
Arxiv 24.06.08 (NeurIPS 2024) | Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL | Link | ||
Arxiv 24.06.12 | MaIL: Improving Imitation Learning with Mamba | Link | ||
Arxiv 24.06.21 | KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty | Link | ||
Arxiv 24.08.05 | Context-aware Mamba-based Reinforcement Learning for social robot navigation | Link | ||
Arxiv 24.08.20 | Integrating Multi-Modal Input Token Mixer Into Mamba-Based Decision Models: Decision MetaMamba | Link | Code | |
Arxiv 24.09.04 | Mamba as a motion encoder for robotic imitation learning | Link | ||
CoRL 24.09.06 | MaIL: Improving Imitation Learning with Selective State Space Models | Link | https://github.com/ALRhub/MaIL | |
TMRB 24.09.20 | Visuomotor Policy Learning for Task Automation of Surgical Robot | Link | Code | |
Arxiv 24.09.23 | DiSPo: Diffusion-SSM based Policy Learning for Coarse-to-Fine Action Discretization | Link | ||
NeurIPS 24.09.26 | Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling | Link | ||
Arxiv 24.10.11 | Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient | Link | Code | |
Arxiv 24.10.25 | Multi-Agent Reinforcement Learning with Selective State-Space Models | Link | Code | |
Arxiv 24.10.29 | A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks | Link | Code | |
Arxiv 24.11.29 | RMIO: A Model-Based MARL Framework for Scenarios with Observation Loss in Some Agents | Link | ||
Arxiv 24.12.01 | Decision Transformer vs. Decision Mamba: Analysing the Complexity of Sequential Decision Making in Atari Games | Link | Code |
Graph Learning
Date | Paper | Figure | Link | Code |
---|---|---|---|---|
Arxiv 24.02.13 (KDD 2024) | Graph Mamba: Towards Learning on Graphs with State Space Models | Link | Code | |
Arxiv 24.05.22 | HeteGraph-Mamba: Heterogeneous Graph Learning via Selective State Space Model | Link | ||
KDD 2024 Workshop 24.06.29 | Identifying Subphenotypes for Sepsis with Acute Kidney Injury via Multimodal Graph State Space Models | Link | ||
Arxiv 24.08.08 | DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models | Link | ||
Arxiv 24.08.13 | DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs | Link | ||
Arxiv 24.09.18 | Topological Deep Learning with State-Space Models: A Mamba Approach for Simplicial Complexes | Link | ||
Arxiv 24.12.11 (AAAI 2025) | DG-Mamba: Robust and Efficient Dynamic Graph Structure Learning with Selective State Space Models | Link | Code |
Audio
Date | Paper | Figure | Link | Code |
---|---|---|---|---|
Arxiv 24.03.12 (IEEE SPL 2024) | Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers | Link | Code | |
Arxiv 24.04.02 | SPMamba: State-space model is all you need in speech separation | Link | ||
Arxiv 24.05.02 | TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms | Link | ||
Arxiv 24.05.10 | An Investigation of Incorporating Mamba for Speech Enhancement | Link | ||
Arxiv 24.05.20 | SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | Link | Code | |
Arxiv 24.05.21 | Mamba in Speech: Towards an Alternative to Self-Attention | Link | ||
Arxiv 24.05.22 | Audio Mamba: Pretrained Audio State Space Model For Audio Tagging | Link | Code | |
Arxiv 24.06.04 (Interspeech 2024) | Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations | Link | Code | |
Arxiv 24.06.05 | Audio Mamba: Bidirectional State Space Model for Audio Representation Learning | Link | Code | |
Arxiv 24.06.10 (Interspeech 2024) | RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection | Link | Code | |
Arxiv 24.06.24 (Interspeech 2024) | Exploring the Capability of Mamba in Speech Applications | Link | ||
Arxiv 24.07.13 | Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis | Link | Mamba-TasNet Code ConMamba Code | |
Arxiv 24.08.09 | SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation | Link | ||
Arxiv 24.09.04 | MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision | Link | ||
Arxiv 24.09.04 (SLT 2024 Workshop) | An Analysis of Linear Complexity Attention Substitutes with BEST-RQ | Link | ||
Arxiv 24.09.07 | Cross-attention Inspired Selective State Space Models for Target Sound Extraction | Link | ||
Arxiv 24.09.08 | TF-Mamba: A Time-Frequency Network for Sound Source Localization | Link | ||
Arxiv 24.09.09 | Vector Quantized Diffusion Model Based Speech Bandwidth Extension | Link | ||
Arxiv 24.09.10 | A Two-Stage Band-Split Mamba-2 Network for Music Separation | Link | Code | |
Arxiv 24.09.11 | Rethinking Mamba in Speech Processing by Self-Supervised Models | Link | Code | |
Arxiv 24.09.13 | MambaFoley: Foley Sound Generation using Selective State-Space Models | Link | Code | |
Arxiv 24.09.14 | Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution | Link | ||
Arxiv 24.09.15 | Self-supervised Learning for Acoustic Few-Shot Classification | Link | ||
Arxiv 24.09.16 | Ultra-Low Latency Speech Enhancement - A Comprehensive Study | Link | ||
Arxiv 24.09.16 | Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement | Link | ||
Arxiv 24.09.18 | Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement | Link | Code | |
Arxiv 24.09.19 | DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification | Link | ||
Arxiv 24.09.26 | MC-SEMamba: A Simple Multi-channel Extension of SEMamba | Link | ||
Arxiv 24.09.27 (SLT 2024) | Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models | Link | Code | |
Arxiv 24.09.30 | Mamba for Streaming ASR Combined with Unimodal Aggregation | Link | Code | |
Arxiv 24.10.01 | Zero-Shot Text-to-Speech from Continuous Text Streams | Link | Code | |
Arxiv 24.10.09 (ICASSP 2025) | Mamba-based Segmentation Model for Speaker Diarization | Link | Code | |
Arxiv 24.10.09 | Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity | Link | ||
Arxiv 24.10.14 | CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning | Link | Code | |
Arxiv 24.10.28 | SepMamba: State-space models for speaker separation using Mamba | Link | Code | |
Arxiv 24.11.09 | Selective State Space Model for Monaural Speech Enhancement | Link | ||
Arxiv 24.11.11 (SLT 2024) | Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition | Link | Code | |
Arxiv 24.11.11 (LAMIR 2024 Workshop) | AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models | Link | Code | |
Arxiv 24.11.12 | SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model | Link | ||
Arxiv 24.11.14 | MASV: Speaker Verification with Global and Local Context Mamba | Link | ||
Expert Systems with Applications 24.12.15 | A barking emotion recognition method based on Mamba and Synchrosqueezing Short-Time Fourier Transform | Link | Code | |
Arxiv 24.11.15 | XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection | Link | ||
Arxiv 24.12.15 | A Comparative Study on Dynamic Graph Embedding based on Mamba and Transformers | Link | ||
Arxiv 24.12.17 | TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification | Link | Code | |
Arxiv 24.11.21 | BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection | Link | Code |
Time Series
Date | Paper | Figure | Link | Code |
---|---|---|---|---|
Arxiv 24.03.14 (ECAI 2024) | TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting | Link | Code | |
Arxiv 24.04.23 | SST: Multi-Scale Hybrid Mamba-Transformer Experts for Long-Short Range Time Series Forecasting | Link | Code | |
Arxiv 24.04.23 | Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting | Link | ||
Arxiv 24.04.24 | Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting | Link | ||
Arxiv 24.05.11 | DTMamba : Dual Twin Mamba for Time Series Forecasting | Link | ||
Arxiv 24.05.25 | Time-SSM: Simplifying and Unifying State Space Models for Time Series Forecasting | Link | ||
Arxiv 24.05.26 | MambaTS: Improved Selective State Space Models for Long-term Time Series Forecasting | Link | ||
Arxiv 24.06.06 (NeurIPS 2024) | Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models | Link | ||
Arxiv 24.06.06 | TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification | Link | ||
Arxiv 24.06.08 | C-Mamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting | Link | Code | |
Arxiv 24.06.17 (IJCAI 2024 Workshop) | SpoT-Mamba: Learning Long-Range Dependency on Spatio-Temporal Graphs with Selective State Spaces | Link | Code | |
Arxiv 24.07.15 | MSegRNN:Enhanced SegRNN Model with Mamba for Long-Term Time Series Forecasting | Link | ||
Arxiv 24.07.20 | FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting | Link | ||
Arxiv 24.08.04 | Mamba-Spike: Enhancing the Mamba Architecture with a Spiking Front-End for Efficient Temporal Data Processing | Link | Code | |
Arxiv 24.08.22 (CGI24) | Simplified Mamba with Disentangled Dependency Encoding for Long-Term Time Series Forecasting | Link | ||
Arxiv 24.08.27 | Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need | Link | Code | |
Arxiv 24.09.13 (ICECCE 2024) | Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics | Link | ||
IOTJ 24.09.18 | HARMamba: Efficient and Lightweight Wearable Sensor Human Activity Recognition Based on Bidirectional Mamba | Link | ||
Arxiv 24.09.21 | Test Time Learning for Time Series Forecasting | Link | ||
Arxiv 24.09.30 | A SSM is Polymerized from Multivariate Time Series | Link | Code | |
Arxiv 24.09.30 (SLT 2024) | SWIM: Short-Window CNN Integrated with Mamba for EEG-Based Auditory Spatial Attention Decoding | Link | Code | |
GRSL 24.10.07 | SPPMamba: State Space Models for Seismic Phase Arrival Picking | Link | ||
Arxiv 24.10.08 | TIMBA: Time series Imputation with Bi-directional Mamba Blocks and Diffusion models | Link | ||
Arxiv 24.10.12 | Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space Models | Link | Code | |
Arxiv 24.10.13 | SlimSeiz: Efficient Channel-Adaptive Seizure Prediction Using a Mamba-Enhanced Network | Link | Code | |
Arxiv 24.10.15 | UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba | Link | ||
Arxiv 24.10.17 | DiffImp: Efficient Diffusion Model for Probabilistic Time Series Imputation with Bidirectional Mamba Backbone | Link | ||
Arxiv 24.10.28 | FACTS: A Factored State-Space Framework For World Modelling | Link | Code | |
Arxiv 24.10.28 | Neural Hamilton: Can A.I. Understand Hamiltonian Mechanics? | Link | Code | |
Arxiv 24.10.30 (NeurIPS 2024 Workshop) | Sequential Order-Robust Mamba for Time Series Forecasting | Link | Code | |
Arxiv 24.11.03 | BiT-MamSleep: Bidirectional Temporal Mamba for EEG Sleep Staging | Link | ||
Arxiv 24.11.05 | A Mamba Foundation Model for Time Series Forecasting | Link | ||
Scientific Reports 24.11.11 | Application of multi-modal temporal neural network based on enhanced sparrow optimization in lithium battery life prediction | Link | ||
Arxiv 24.11.19 | Contrast Similarity-Aware Dual-Pathway Mamba for Multivariate Time Series Node Classification | Link | Code | |
Arxiv 24.11.26 | MTS-UNMixers: Multivariate Time Series Forecasting via Channel-Time Dual Unmixing | Link | Code | |
Arxiv 24.11.28 | MSEMG: Surface Electromyography Denoising with a Mamba-based Efficient Network | Link | Code | |
Scientific Reports 24.11.29 | Mastering seismic time series response predictions using an attention-Mamba transformer model for bridge bearings and piers across varied testing conditions | Link | ||
Arxiv 24.12.06 | MSECG: Incorporating Mamba for Robust and Efficient ECG Super-Resolution | Link | ||
Arxiv 24.12.10 | Bidirectional Mamba state-space model for anomalous diffusion | Link | Code | |
Arxiv 24.12.12 | Optimising TinyML with Quantization and Distillation of Transformer and Mamba Models for Indoor Localisation on Edge Devices | Link | Code |