Awesome
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
<div align="center"> <img width="1421" alt="Meissonic Banner" src="https://github.com/user-attachments/assets/703f6882-163a-42d0-8da8-3680231ca75e"> </div>๐ Introduction
Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.
Key Features:
- ๐ผ๏ธ High-resolution image generation (up to 1024x1024)
- ๐ป Designed to run on consumer GPUs
- ๐จ Versatile applications: text-to-image, image-to-image
๐ ๏ธ Prerequisites
Step 1: Clone the repository
git clone https://github.com/viiika/Meissonic/
cd Meissonic
Step 2: Create virtual environment
conda create --name meissonic python
conda activate meissonic
pip install -r requirements.txt
Step 3: Install diffusers
git clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .
๐ก Usage
Gradio Web UI
python app.py
Command-line Interface
Text-to-Image Generation
python inference.py --prompt "Your creative prompt here"
Inpainting and Outpainting
python inpaint.py --mode inpaint --input_image path/to/image.jpg
python inpaint.py --mode outpaint --input_image path/to/image.jpg
Advanced: FP8 Quantization
Optimize performance with FP8 quantization:
Requirements:
- CUDA 12.4
- PyTorch 2.4.1
- TorchAO
Note: Windows users install TorchAO using
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cpu
Command-line inference
python inference_fp8.py --quantization fp8
Gradio for FP8 (Select Quantization Method in Advanced settings)
python app_fp8.py
Performance Benchmarks
Precision (Steps=64, Resolution=1024x1024) | Batch Size=1 (Avg. Time) | Memory Usage |
---|---|---|
FP32 | 13.32s | 12GB |
FP16 | 12.35s | 9.5GB |
FP8 | 12.93s | 8.7GB |
๐จ Showcase
<div align="center"> <img src="https://github.com/user-attachments/assets/b30a7912-5453-48ba-aff4-bfb547bbe626" width="320" alt="A pillow with a picture of a Husky on it."> <p><i>"A pillow with a picture of a Husky on it."</i></p> </div> <div align="center"> <img src="https://github.com/user-attachments/assets/b23a1603-399d-40d6-8e16-c077d3d12a08" width="320" alt="A white coffee mug, a solid black background"> <p><i>"A white coffee mug, a solid black background"</i></p> </div>๐ Citation
If you find this work helpful, please consider citing:
@article{bai2024meissonic,
title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
journal={arXiv preprint arXiv:2410.08261},
year={2024}
}
@article{shao2024bagdesignchoicesinference,
title={Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer},
author={Shitong Shao and Zikai Zhou and Tian Ye and Lichen Bai and Zhiqiang Xu and Zeke Xie},
journal={arXiv preprint arXiv:2411.10781},
year={2024}
}
๐ Acknowledgements
We thank the community and contributors for their invaluable support in developing Meissonic. We thank apolinario@multimodal.art for making Meissonic Demo. We thank @NewGenAI and @้ฃ้ทนใใใ@่ช็งฐๆ็ณปใใญใฐใฉใใฎๅๅผท for making YouTube tutorials. We thank @pprp for making fp8 and int4 quantization. We thank @camenduru for making jupyter tutorial. We thank @chenxwh for making Replicate demo and api. We thank Collov Labs for reproducing Monetico. We thank Shitong et al. for identifying effective design choices for enhancing visual quality.
<p align="center"> <a href="https://star-history.com/#viiika/Meissonic&Date"> <img src="https://api.star-history.com/svg?repos=viiika/Meissonic&type=Date" alt="Star History Chart"> </a> </p> <p align="center"> Made with โค๏ธ by the MeissonFlow Research </p>