Awesome
Auto 1111 SDK: Stable Diffusion Python library
<p> <a href="https://pepy.tech/project/auto1111sdk"> <img alt="GitHub release" src="https://static.pepy.tech/badge/auto1111sdk"> </a> </p> Auto 1111 SDK is a lightweight Python library for using Stable Diffusion generating images, upscaling images, and editing images with diffusion models. It is designed to be a modular, light-weight Python client that encapsulates all the main features of the [Automatic 1111 Stable Diffusion Web Ui](https://github.com/AUTOMATIC1111/stable-diffusion-webui). Auto 1111 SDK offers 3 main core features currently:- Text-to-Image, Image-to-Image, Inpainting, and Outpainting pipelines. Our pipelines support the exact same parameters as the Stable Diffusion Web UI, so you can easily replicate creations from the Web UI on the SDK.
- Upscaling Pipelines that can run inference for any Esrgan or Real Esrgan upscaler in a few lines of code.
- An integration with Civit AI to directly download models from the website.
Join our Discord!!
Demo
We have a colab demo where you can run many of the operations of Auto 1111 SDK. Check it out here!!
Installation
We recommend installing Auto 1111 SDK in a virtual environment from PyPI. Right now, we do not have support for conda environments yet.
pip3 install auto1111sdk
To install the latest version of Auto 1111 SDK (with controlnet now included), run:
pip3 install git+https://github.com/saketh12/Auto1111SDK.git
Quickstart
Generating images with Auto 1111 SDK is super easy. To run inference for Text-to-Image, Image-to-Image, Inpainting, Outpainting, or Stable Diffusion Upscale, we have 1 pipeline that can support all these operations. This saves a lot of RAM from having to create multiple pipeline objects with other solutions.
from auto1111sdk import StableDiffusionPipeline
pipe = StableDiffusionPipeline("<Path to your local safetensors or checkpoint file>")
prompt = "a picture of a brown dog"
output = pipe.generate_txt2img(prompt = prompt, height = 1024, width = 768, steps = 10)
output[0].save("image.png")
Controlnet
Right now, Controlnet only works with fp32. We are adding support for fp16 very soon.
from auto1111sdk import StableDiffusionPipeline
from auto1111sdk import ControlNetModel
model = ControlNetModel(model="<THE CONTROLNET MODEL FILE NAME (WITHOUT EXTENSION)>",
image="<PATH TO IMAGE>")
pipe = StableDiffusionPipeline("<Path to your local safetensors or checkpoint file>", controlnet=model)
prompt = "a picture of a brown dog"
output = pipe.generate_txt2img(prompt = prompt, height = 1024, width = 768, steps = 10)
output[0].save("image.png")
Running on Windows
Find the instructions here. Contributed by by Marco Guardigli, mgua@tomware.it
Documentation
We have more detailed examples/documentation of how you can use Auto 1111 SDK here. For a detailed comparison between us and Huggingface diffusers, you can read this.
For a detailed guide on how to use SDXL, we recommend reading this
Features
- Original txt2img and img2img modes
- Real ESRGAN upscale and Esrgan Upscale (compatible with any pth file)
- Outpainting
- Inpainting
- Stable Diffusion Upscale
- Attention, specify parts of text that the model should pay more attention to
- a man in a
((tuxedo))
- will pay more attention to tuxedo - a man in a
(tuxedo:1.21)
- alternative syntax - select text and press
Ctrl+Up
orCtrl+Down
(orCommand+Up
orCommand+Down
if you're on a MacOS) to automatically adjust attention to selected text (code contributed by anonymous user)
- a man in a
- Composable Diffusion: a way to use multiple prompts at once
- separate prompts using uppercase AND
- also supports weights for prompts: a cat :1.2 AND a dog AND a penguin :2.2
- Works with a variety of samplers
- Download models directly from Civit AI and RealEsrgan checkpoints
- Set custom VAE: works for any model including SDXL
- Support for SDXL with Stable Diffusion XL Pipelines
- Pass in custom arguments to the models
- No 77 prompt token limit (unlike Huggingface Diffusers, which has this limit)
Roadmap
- Adding support Hires Fix and Refiner parameters for inference.
- Adding support for Lora's
- Adding support for Face restoration
- Adding support for Dreambooth training script.
- Adding support for custom extensions like Controlnet.
We will be adding support for these features very soon. We also accept any contributions to work on these issues!
Contributing
Auto1111 SDK is continuously evolving, and we appreciate community involvement. We welcome all forms of contributions - bug reports, feature requests, and code contributions.
Report bugs and request features by opening an issue on Github. Contribute to the project by forking/cloning the repository and submitting a pull request with your changes.
Credits
Licenses for borrowed code can be found in Settings -> Licenses
screen, and also in html/licenses.html
file.
- Automatic 1111 Stable Diffusion Web UI - https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Stable Diffusion - https://github.com/Stability-AI/stablediffusion, https://github.com/CompVis/taming-transformers
- k-diffusion - https://github.com/crowsonkb/k-diffusion.git
- ESRGAN - https://github.com/xinntao/ESRGAN
- MiDaS - https://github.com/isl-org/MiDaS
- Ideas for optimizations - https://github.com/basujindal/stable-diffusion
- Cross Attention layer optimization - Doggettx - https://github.com/Doggettx/stable-diffusion, original idea for prompt editing.
- Cross Attention layer optimization - InvokeAI, lstein - https://github.com/invoke-ai/InvokeAI (originally http://github.com/lstein/stable-diffusion)
- Sub-quadratic Cross Attention layer optimization - Alex Birch (https://github.com/Birch-san/diffusers/pull/1), Amin Rezaei (https://github.com/AminRezaei0x443/memory-efficient-attention)
- Textual Inversion - Rinon Gal - https://github.com/rinongal/textual_inversion (we're not using his code, but we are using his ideas).
- Idea for SD upscale - https://github.com/jquesnelle/txt2imghd
- Noise generation for outpainting mk2 - https://github.com/parlance-zz/g-diffuser-bot
- CLIP interrogator idea and borrowing some code - https://github.com/pharmapsychotic/clip-interrogator
- Idea for Composable Diffusion - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
- xformers - https://github.com/facebookresearch/xformers
- Sampling in float32 precision from a float16 UNet - marunine for the idea, Birch-san for the example Diffusers implementation (https://github.com/Birch-san/diffusers-play/tree/92feee6)