Home

Awesome

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models 🔥

<a href='https://sungnyun.github.io/diffblender/'><img src='https://img.shields.io/badge/Project-Page-yellow'></a> <a href='https://arxiv.org/abs/2305.15194'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href=#bibtex><img src='https://img.shields.io/badge/Paper-BibTex-Green'></a> <a href='https://huggingface.co/sungnyun/diffblender'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DiffBlender_Model-blue'></a>

<p align="center"> <img width="1369" alt="teaser" src="./assets/fig1.png"> </p>

🗓️ TODOs

🚀 Getting Started

Install the necessary packages with:

$ pip install -r requirements.txt

Download DiffBlender model checkpoint from this Huggingface model, and place it under ./diffblender_checkpoints/.
Also, prepare the SD model from this link (we used CompVis/sd-v1-4.ckpt).

⚡️ Try Multimodal T2I Generation with DiffBlender

$ python inference.py --ckpt_path=./diffblender_checkpoints/{CKPT_NAME}.pth \
                      --official_ckpt_path=/path/to/sd-v1-4.ckpt \
                      --save_name={SAVE_NAME} 

Results will be saved under ./inference/{SAVE_NAME}/, in the format as {conditions + generated image}.

BibTeX

@article{kim2023diffblender,
  title={DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models},
  author={Kim, Sungnyun and Lee, Junsoo and Hong, Kibeom and Kim, Daesik and Ahn, Namhyuk},
  journal={arXiv preprint arXiv:2305.15194},
  year={2023}
}