Awesome
Prompt-Free Diffusion
This repo hosts the official implementation of:
Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, and Humphrey Shi, Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, Paper arXiv Link.
News
- [2023.06.20]: SDWebUI plugin is created, repo at this link
- [2023.05.25]: Our demo is running on HuggingFaceπ€
- [2023.05.25]: Repo created
Introduction
Prompt-Free Diffusion is a diffusion model that relys on only visual inputs to generate new images, handled by Semantic Context Encoder (SeeCoder) by substituting the commonly used CLIP-based text encoder. SeeCoder is reusable to most public T2I models as well as adaptive layers like ControlNet, LoRA, T2I-Adapter, etc. Just drop in and play!
<p align="center"> <img src="assets/figures/reusability.png" width="90%"> </p>Performance
<p align="center"> <img src="assets/figures/qualitative_show.png" width="99%"> </p>Network
<p align="center"> <img src="assets/figures/prompt_free_diffusion.png" width="60%"> </p> <p align="center"> <img src="assets/figures/seecoder.png" width="99%"> </p>Setup
conda create -n prompt-free-diffusion python=3.10
conda activate prompt-free-diffusion
pip install torch==2.0.0+cu117 torchvision==0.15.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt
Demo
We provide a WebUI empowered by Gradio. Start the WebUI with the following command:
python app.py
Pretrained models
To support the full functionality of our demo. You need the following models located in these paths:
βββ pretrained
βββ pfd
| βββ vae
| β βββ sd-v2-0-base-autokl.pth
| βββ diffuser
| β βββ AbyssOrangeMix-v2.safetensors
| β βββ AbyssOrangeMix-v3.safetensors
| β βββ Anything-v4.safetensors
| β βββ Deliberate-v2-0.safetensors
| β βββ OpenJouney-v4.safetensors
| β βββ RealisticVision-v2-0.safetensors
| β βββ SD-v1-5.safetensors
| βββ seecoder
| βββ seecoder-v1-0.safetensors
| βββ seecoder-pa-v1-0.safetensors
| βββ seecoder-anime-v1-0.safetensors
βββ controlnet
βββ control_sd15_canny_slimmed.safetensors
βββ control_sd15_depth_slimmed.safetensors
βββ control_sd15_hed_slimmed.safetensors
βββ control_sd15_mlsd_slimmed.safetensors
βββ control_sd15_normal_slimmed.safetensors
βββ control_sd15_openpose_slimmed.safetensors
βββ control_sd15_scribble_slimmed.safetensors
βββ control_sd15_seg_slimmed.safetensors
βββ control_v11p_sd15_canny_slimmed.safetensors
βββ control_v11p_sd15_lineart_slimmed.safetensors
βββ control_v11p_sd15_mlsd_slimmed.safetensors
βββ control_v11p_sd15_openpose_slimmed.safetensors
βββ control_v11p_sd15s2_lineart_anime_slimmed.safetensors
βββ control_v11p_sd15_softedge_slimmed.safetensors
βββ preprocess
βββ hed
β βββ ControlNetHED.pth
βββ midas
β βββ dpt_hybrid-midas-501f0c75.pt
βββ mlsd
β βββ mlsd_large_512_fp32.pth
βββ openpose
β βββ body_pose_model.pth
β βββ facenet.pth
β βββ hand_pose_model.pth
βββ pidinet
βββ table5_pidinet.pth
All models can be downloaded at HuggingFace link.
Tools
We also provide tools to convert pretrained models from sdwebui and diffuser library to this codebase, please modify the following files:
βββ tools
Β Β βββ get_controlnet.py
Β Β βββ model_conversion.pth
You are expected to do some customized coding to make it work (i.e. changing hardcoded input output file paths)
Performance Anime
<p align="center"> <img src="assets/figures/anime.png" width="70%"> </p>Citation
@article{xu2023prompt,
title={Prompt-Free Diffusion: Taking" Text" out of Text-to-Image Diffusion Models},
author={Xu, Xingqian and Guo, Jiayi and Wang, Zhangyang and Huang, Gao and Essa, Irfan and Shi, Humphrey},
journal={arXiv preprint arXiv:2305.16223},
year={2023}
}
Acknowledgement
Part of the codes reorganizes/reimplements code from the following repositories: Versatile Diffusion official Github and ControlNet sdwebui Github, which are also great influenced by LDM official Github and DDPM official Github