Home

Awesome

<center>Web-based UI for Stable Diffusion</center>

Created by Sygil.Dev

Join us at Sygil.Dev's Discord Server Generic badge

Installation instructions for:

Want to ask a question or request a feature?

Come to our Discord Server or use Discussions.

Documentation

Documentation is located here

Want to contribute?

Check the Contribution Guide

Sygil-Dev main devs:

Project Features:

SD WebUI

An easy way to work with Stable Diffusion right from your browser.

Streamlit

Features:

Prompt Weights & Negative Prompts:

To give a token (tag recognized by the AI) a specific or increased weight (emphasis), add :0.## to the prompt, where 0.## is a decimal that will specify the weight of all tokens before the colon. Ex: cat:0.30, dog:0.70 or guy riding a bicycle :0.7, incoming car :0.30

Negative prompts can be added by using ### , after which any tokens will be seen as negative. Ex: cat playing with string ### yarn will negate yarn from the generated image.

Negatives are a very powerful tool to get rid of contextually similar or related topics, but be careful when adding them since the AI might see connections you can't, and end up outputting gibberish

*Tip: Try using the same seed with different prompt configurations or weight values see how the AI understands them, it can lead to prompts that are more well-tuned and less prone to error.

Please see the Streamlit Documentation to learn more.

Gradio [Legacy]

Features:

Note: the Gradio interface is no longer being actively developed by Sygil.Dev and is only receiving bug fixes.

Please see the Gradio Documentation to learn more.

Image Upscalers


GFPGAN

Lets you improve faces in pictures using the GFPGAN model. There is a checkbox in every tab to use GFPGAN at 100%, and also a separate tab that just allows you to use GFPGAN on any picture, with a slider that controls how strong the effect is.

If you want to use GFPGAN to improve generated faces, you need to install it separately. Download GFPGANv1.4.pth and put it into the /sygil-webui/models/gfpgan directory.

RealESRGAN

Lets you double the resolution of generated images. There is a checkbox in every tab to use RealESRGAN, and you can choose between the regular upscaler and the anime version. There is also a separate tab for using RealESRGAN on any picture.

Download RealESRGAN_x4plus.pth and RealESRGAN_x4plus_anime_6B.pth. Put them into the sygil-webui/models/realesrgan directory.

LSDR

Download LDSR project.yaml and model last.cpkt. Rename last.ckpt to model.ckpt and place both under sygil-webui/models/ldsr/

GoBig, and GoLatent (Currently on the Gradio version Only)

More powerful upscalers that uses a separate Latent Diffusion model to more cleanly upscale images.

Please see the Post-Processing Documentation to learn more.


Original Information From The Stable Diffusion Repo:

Stable Diffusion

Stable Diffusion was made possible thanks to a collaboration with Stability AI and Runway and builds upon our previous work:

High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer

CVPR '22 Oral

which is available on GitHub. PDF at arXiv. Please also visit our Project page.

Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database. Similar to Google's Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See this section below and the model card.

Stable Diffusion v1

Stable Diffusion v1 refers to a specific configuration of the model architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and then finetuned on 512x512 images.

*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present in its training data. Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding model card.

Comments

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models},
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}