Awesome

This is a ChatGPT-4 English adaptation of the original document by kohya-ss (README.md)

I wanted to bring the japanese tutorial to the english speaking community, merging knowledge globally to pursue the development of the stable diffusion models to another level.

Happy Fine-Tuning!

This repository contains training, generation, and utility scripts for Stable Diffusion.

The Change History has been moved to the bottom of the page.

For the Japanese version of the README.

For easier usage, including GUI and PowerShell scripts, please visit the repository maintained by bmaltais. Special thanks to @bmaltais!

This repository includes scripts for the following:

DreamBooth training, including U-Net and Text Encoder
Fine-tuning (native training), including U-Net and Text Encoder
LoRA training
Textual Inversion training
Image generation
Model conversion (supports 1.x and 2.x, Stable Diffusion ckpt/safetensors, and Diffusers)

The Stable Diffusion web UI now appears to support LoRA training with sd-scripts. Thank you for the great work!

About requirements.txt

These files do not include requirements for PyTorch, as the required versions depend on your specific environment. Please install PyTorch first (refer to the installation guide below).

The scripts have been tested with PyTorch 1.12.1 and 1.13.0, as well as Diffusers 0.10.2.

Links to usage documentation

Most of the documents are written in Japanese.

Training guide - common: data preparation, options, etc.
- Chinese version
Dataset config
DreamBooth training guide
Step-by-step fine-tuning guide
LoRA training
Textual Inversion training
Image generation
Model conversion on note.com

Windows Required Dependencies

Python 3.10.6 and Git:

Python 3.10.6: https://www.python.org/ftp/python/3.10.6/python-3.10.6-amd64.exe
Git: https://git-scm.com/download/win

Grant unrestricted script access to PowerShell so that venv can work:

Open an administrator PowerShell window
Type Set-ExecutionPolicy Unrestricted and answer A
Close the admin PowerShell window

Windows Installation

Open a regular PowerShell terminal and enter the following commands:

git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts

python -m venv venv
.\venv\Scripts\activate

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install --upgrade -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py

accelerate config

Note: It is recommended to use python -m venv venv instead of python -m venv --system-site-packages venv to avoid potential issues with global Python packages.

Answers to accelerate config:

- This machine
- No distributed training
- NO
- NO
- NO
- all
- fp16

Note: Some users have reported encountering a ValueError: fp16 mixed precision requires a GPU error during training. In this case, answer 0 for the 6th question: What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:

(Only the single GPU with id 0 will be used.)

About PyTorch and xformers

Other versions of PyTorch and xformers may cause problems during training. If there are no other constraints, please install the specified version.

Optional: Use Lion8bit

To use Lion8bit, you need to upgrade bitsandbytes to version 0.38.0 or later. Uninstall bitsandbytes, and for Windows, install the Windows version of the .whl file from here or other sources, like:

pip install https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl

To upgrade, update this repository with pip install ., and upgrade the necessary packages manually.

Upgrade

When a new release is available, you can upgrade your repository using the following command:

cd sd-scripts
git pull
.\venv\Scripts\activate
pip install --use-pep517 --upgrade -r requirements.txt

Once the commands have been executed successfully, you should be ready to use the new version.

Credits

The implementation for LoRA is based on cloneofsimo's repo. Thank you for the excellent work!

The LoRA expansion to Conv2d 3x3 was initially released by cloneofsimo, and its effectiveness was demonstrated at LoCon by KohakuBlueleaf. Thank you so much, KohakuBlueleaf!

License

The majority of the scripts are licensed under ASL 2.0 (including codes from Diffusers, cloneofsimo's, and LoCon). However, portions of the project are available under separate license terms:

Memory Efficient Attention Pytorch: MIT

bitsandbytes: MIT

BLIP: BSD-3-Clause

Change History

May 11, 2023

Added an option --dim_from_weights to train_network.py to automatically determine the dim(rank) from the weight file. PR #491 Thanks to AI-Casanova!
- It is useful in combination with resize_lora.py. Please see the PR for details.
Fixed a bug where the noise resolution was incorrect with Multires noise. PR #489 Thanks to sdbds!
- Please see the PR for details.
The image generation scripts can now use img2img and highres fix simultaneously.
Fixed a bug where the hint image of ControlNet was incorrectly BGR instead of RGB in the image generation scripts.
Added a feature to the image generation scripts to use the memory-efficient VAE.
- If you specify a number with the --vae_slices option, the memory-efficient VAE will be used. The maximum output size will be larger, but it will be slower. Please specify a value of about 16 or 32.
- The implementation of the VAE is in library/slicing_vae.py.

May 7, 2023

The documentation has been moved to the docs folder. If you have links, please update them accordingly.
Removed gradio from requirements.txt.
DAdaptAdaGrad, DAdaptAdan, and DAdaptSGD are now supported by DAdaptation. PR#455 Thanks to sdbds!
- DAdaptation needs to be installed. Also, depending on the optimizer, DAdaptation may need to be updated. Please update with pip install --upgrade dadaptation.
Added support for pre-calculation of LoRA weights in image generation scripts. Specify --network_pre_calc.
- The prompt option --am is available. Also, it is disabled when Regional LoRA is used.
Added Adaptive noise scale to each training script. Specify a number with --adaptive_noise_scale to enable it.
- This is an experimental option. It may be removed or changed in the future.
- This is an original implementation that automatically adjusts the value of the noise offset according to the absolute value of the mean of each channel of the latents. It is expected that appropriate noise offsets will be set for bright and dark images, respectively.
- Specify it together with --noise_offset.
- The actual value of the noise offset is calculated as noise_offset + abs(mean(latents, dim=(2,3))) * adaptive_noise_scale. Since the latent is close to a normal distribution, it may be a good idea to specify a value of about 1/10 to the same as the noise offset.
- Negative values can also be specified, in which case the noise offset will be clipped to 0 or more.
Other minor fixes.

Please read the Releases for recent updates.

Naming of LoRA

To avoid confusion, the LoRA supported by train_network.py has been assigned specific names. The documentation has been updated accordingly. The following are the names of LoRA types in this repository:

LoRA-LierLa: (LoRA for Li n e a r La yers)

This LoRA is applicable to Linear layers and Conv2d layers with a 1x1 kernel.
LoRA-C3Lier: (LoRA for C onvolutional layers with a 3 x3 Kernel and Li n e a r layers)

In addition to the first type, this LoRA is applicable for Conv2d layers with a 3x3 kernel.

LoRA-LierLa is the default LoRA type for train_network.py (without conv_dim network argument). LoRA-LierLa can be used with our extension for AUTOMATIC1111's Web UI or the built-in LoRA feature of the Web UI.

To use LoRA-C3Lier with the Web UI, please utilize our extension.

Sample Image Generation During Training

An example prompt file might look like this:

# prompt 1
masterpiece, best quality, (1girl), in white shirts, upper body, looking at viewer, simple background --n low quality, worst quality, bad anatomy, bad composition, poor, low effort --w 768 --h 768 --d 1 --l 7.5 --s 28

# prompt 2
masterpiece, best quality, 1boy, in business suit, standing at street, looking back --n (low quality, worst quality), bad anatomy, bad composition, poor, low effort --w 576 --h 832 --d 2 --l 5.5 --s 40

Lines starting with # are considered comments. You can specify options for the generated image with options like --n after the prompt. The following can be used:

--n Negative prompt up to the next option.
--w Specifies the width of the generated image.
--h Specifies the height of the generated image.
--d Specifies the seed of the generated image.
--l Specifies the CFG scale of the generated image.
--s Specifies the number of steps in the generation.

Prompt weightings, such as ( ) and [ ], are functional.

Generating Sample Images

The prompt file may look like the following:

# prompt 1
masterpiece, best quality, (1girl), in white shirts, upper body, looking at viewer, simple background --n low quality, worst quality, bad anatomy, bad composition, poor, low effort --w 768 --h 768 --d 1 --l 7.5 --s 28

# prompt 2
masterpiece, best quality, 1boy, in business suit, standing at street, looking back --n (low quality, worst quality), bad anatomy, bad composition, poor, low effort --w 576 --h 832 --d 2 --l 5.5 --s 40

Lines starting with # are treated as comments. Options can be specified in the format "double hyphen + lowercase letter," such as --n. The following options are available:

--n Negative prompt up to the next option.
--w Specifies the width of the generated image.
--h Specifies the height of the generated image.
--d Specifies the seed of the generated image.
--l Specifies the CFG scale of the generated image.
--s Specifies the number of steps in the generation.

Weightings such as ( ) and [ ] also work.