Home

Awesome

GlobalMamba: Global Image Serialization for Vision Mamba

This repository is the official implementation of GlobalMamba: Global Image Serialization for Vision Mamba

Paper

GlobalMamba: Global Image Serialization for Vision Mamba

Chengkun Wang, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Motivation of GlobalMamba

Alt text Vim and VMamba adopt a flattening strategy similar to (a) and (b), transmuting two-dimensional images into one-dimensional sequences by row or column, while LocalMamba (c) performs the corresponding flattening within a local window. Notably, these sequences lack the inherent causal sequencing of tokens that is characteristic of the causal architecture of Mamba causal architecture. Differently, GlobalMamba (d) constructs a causal token sequence by frequency, while ensuring that tokens acquire global feature information.

Overall framework of GlobalMamba

Alt text

Environments of training

Train Your GlobalMamba

bash globalmamba/scripts/tiny.sh

bash globalmamba/scripts/small.sh

The above code trains GlobalMamba based on Vim. We have reorganized the original token sequence based on frequency in the models_mamba.py file, so you only need to transfer this part to other vision mamba frameworks for comparison.

Results

Alt text

Acknowledgement

This project is based on Vision Mamba (code), Mamba (code), Causal-Conv1d (code), DeiT (code). Thanks for their wonderful works.

Citation

If you find this project helpful, please consider citing the following paper:

@article{wang2024globalmamba,
    title={GlobalMamba: Global Image Serialization for Vision Mamba},
    author={Chengkun Wang and Wenzhao Zheng and Jie Zhou and Jiwen Lu},
    journal={arXiv preprint arXiv:2410.10316},
    year={2024}
}