Home

Awesome

This is a ChatGPT-4 English adaptation of the original document by kohya-ss (README.md)

I wanted to bring the japanese tutorial to the english speaking community, merging knowledge globally to pursue the development of the stable diffusion models to another level.

Happy Fine-Tuning!

This repository contains training, generation, and utility scripts for Stable Diffusion.

The Change History has been moved to the bottom of the page.

For the Japanese version of the README.

For easier usage, including GUI and PowerShell scripts, please visit the repository maintained by bmaltais. Special thanks to @bmaltais!

This repository includes scripts for the following:

The Stable Diffusion web UI now appears to support LoRA training with sd-scripts. Thank you for the great work!

About requirements.txt

These files do not include requirements for PyTorch, as the required versions depend on your specific environment. Please install PyTorch first (refer to the installation guide below).

The scripts have been tested with PyTorch 1.12.1 and 1.13.0, as well as Diffusers 0.10.2.

Links to usage documentation

Most of the documents are written in Japanese.

Windows Required Dependencies

Python 3.10.6 and Git:

Grant unrestricted script access to PowerShell so that venv can work:

Windows Installation

Open a regular PowerShell terminal and enter the following commands:

git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts

python -m venv venv
.\venv\Scripts\activate

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install --upgrade -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py

accelerate config

Note: It is recommended to use python -m venv venv instead of python -m venv --system-site-packages venv to avoid potential issues with global Python packages.

Answers to accelerate config:

- This machine
- No distributed training
- NO
- NO
- NO
- all
- fp16

Note: Some users have reported encountering a ValueError: fp16 mixed precision requires a GPU error during training. In this case, answer 0 for the 6th question: What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:

(Only the single GPU with id 0 will be used.)

About PyTorch and xformers

Other versions of PyTorch and xformers may cause problems during training. If there are no other constraints, please install the specified version.

Optional: Use Lion8bit

To use Lion8bit, you need to upgrade bitsandbytes to version 0.38.0 or later. Uninstall bitsandbytes, and for Windows, install the Windows version of the .whl file from here or other sources, like:

pip install https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl

To upgrade, update this repository with pip install ., and upgrade the necessary packages manually.

Upgrade

When a new release is available, you can upgrade your repository using the following command:

cd sd-scripts
git pull
.\venv\Scripts\activate
pip install --use-pep517 --upgrade -r requirements.txt

Once the commands have been executed successfully, you should be ready to use the new version.

Credits

The implementation for LoRA is based on cloneofsimo's repo. Thank you for the excellent work!

The LoRA expansion to Conv2d 3x3 was initially released by cloneofsimo, and its effectiveness was demonstrated at LoCon by KohakuBlueleaf. Thank you so much, KohakuBlueleaf!

License

The majority of the scripts are licensed under ASL 2.0 (including codes from Diffusers, cloneofsimo's, and LoCon). However, portions of the project are available under separate license terms:

Memory Efficient Attention Pytorch: MIT

bitsandbytes: MIT

BLIP: BSD-3-Clause

Change History

May 11, 2023

May 7, 2023

Please read the Releases for recent updates.

Naming of LoRA

To avoid confusion, the LoRA supported by train_network.py has been assigned specific names. The documentation has been updated accordingly. The following are the names of LoRA types in this repository:

  1. LoRA-LierLa: (LoRA for Li n e a r La yers)

    This LoRA is applicable to Linear layers and Conv2d layers with a 1x1 kernel.

  2. LoRA-C3Lier: (LoRA for C onvolutional layers with a 3 x3 Kernel and Li n e a r layers)

    In addition to the first type, this LoRA is applicable for Conv2d layers with a 3x3 kernel.

LoRA-LierLa is the default LoRA type for train_network.py (without conv_dim network argument). LoRA-LierLa can be used with our extension for AUTOMATIC1111's Web UI or the built-in LoRA feature of the Web UI.

To use LoRA-C3Lier with the Web UI, please utilize our extension.

Sample Image Generation During Training

An example prompt file might look like this:

# prompt 1
masterpiece, best quality, (1girl), in white shirts, upper body, looking at viewer, simple background --n low quality, worst quality, bad anatomy, bad composition, poor, low effort --w 768 --h 768 --d 1 --l 7.5 --s 28

# prompt 2
masterpiece, best quality, 1boy, in business suit, standing at street, looking back --n (low quality, worst quality), bad anatomy, bad composition, poor, low effort --w 576 --h 832 --d 2 --l 5.5 --s 40

Lines starting with # are considered comments. You can specify options for the generated image with options like --n after the prompt. The following can be used:

Prompt weightings, such as ( ) and [ ], are functional.

Generating Sample Images

The prompt file may look like the following:

# prompt 1
masterpiece, best quality, (1girl), in white shirts, upper body, looking at viewer, simple background --n low quality, worst quality, bad anatomy, bad composition, poor, low effort --w 768 --h 768 --d 1 --l 7.5 --s 28

# prompt 2
masterpiece, best quality, 1boy, in business suit, standing at street, looking back --n (low quality, worst quality), bad anatomy, bad composition, poor, low effort --w 576 --h 832 --d 2 --l 5.5 --s 40

Lines starting with # are treated as comments. Options can be specified in the format "double hyphen + lowercase letter," such as --n. The following options are available:

Weightings such as ( ) and [ ] also work.