Awesome

<div align="center"> 🚀 Dimba: Transformer-Mamba Diffusion Models <div>

This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper Transformer-Mamba Diffusion Models. You can find more visualizations on our project page.

<b> TL; DR: Dimba is a new text-to-image diffusion model that employs a hybrid architecture combining Transformer and Mamba elements, thus capitalizing on the advantages of both architectural paradigms.</b>

some generated cases.

1. Environments

Python 3.10
- conda create -n your_env_name python=3.10
Requirements file
- pip install -r requirements.txt
Install causal_conv1d and mamba
- pip install -e causal_conv1d
- pip install -e mamba

2. Download Models

Models reported in paper can be directly dounloaded as follows （Urgent upload in progress）:

Model	#Params	url
t5	4.3B	huggingface
vae	80M	huggingface
Dimba-L-512	0.9B	huggingface
Dimba-L-1024	0.9B	-
Dimba-L-2048	0.9B	-
Dimba-G-512	1.8B	-
Dimba-G-1024	1.8B	-

The datasets used to quality tuning for aesthetic performance enhancement can be download as:

Dataset	Size	url
Quality tuning	600k	huggingface

3. Inference

We include a inference script which samples images from a Dimba model accroding to textual prompts. It supports DDIM and dpm-solver sampling algorithm. You can run the scripts as:

python scripts/inference.py \
--image_size 512 \
--model_version dimba-l \
--model_path /path/to/model \
--txt_file asset/examples.txt \
--save_path /path/to/save/results

4. Training

We provide a training script for Dimba in scripts/train.py. This script can be used to fine-tuning with different settings. You can run the scripts as:

python -m torch.distributed.launch --nnodes=4 --nproc_per_node=8 \
    --master_port=1234 scripts/train.py \
    configs/dimba_xl2_img512.py \
    --work-dir outputs

5. BibTeX

@misc{fei2024dimba,
    title={Dimba: Transformer-Mamba Diffusion Models}, 
    author={Zhengcong Fei and Mingyuan Fan and Changqian Yu and Debang Li and Youqiang Zhang and Junshi Huang},
    year={2024},
    eprint={2406.01159},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

6. Acknowledgments

The codebase is based on the awesome PixArt, Vim, and DiS repos.

The Dimba paper is polished with ChatGPT using prompt.