Home

Awesome

SimpleTuner 💹

⚠️ Warning: The scripts in this repository have the potential to damage your training data. Always maintain backups before proceeding.

SimpleTuner is geared towards simplicity, with a focus on making the code easily understood. This codebase serves as a shared academic exercise, and contributions are welcome.

Table of Contents

Design Philosophy

Tutorial

Please fully explore this README before embarking on the tutorial, as it contains vital information that you might need to know first.

For a quick start without reading the full documentation, you can use the Quick Start guide.

For memory-constrained systems, see the DeepSpeed document which explains how to use 🤗Accelerate to configure Microsoft's DeepSpeed for optimiser state offload.

For multi-node distributed training, this guide will help tweak the configurations from the INSTALL and Quickstart guides to be suitable for multi-node training, and optimising for image datasets numbering in the billions of samples.


Features

Flux.1

Full training support for Flux.1 is included:

See hardware requirements or the quickstart guide.

PixArt Sigma

SimpleTuner has extensive training integration with PixArt Sigma - both the 600M & 900M models load without modification.

See the PixArt Quickstart guide to start training.

Stable Diffusion 3

See the Stable Diffusion 3 Quickstart to get going.

Kwai Kolors

An SDXL-based model with ChatGLM (General Language Model) 6B as its text encoder, doubling the hidden dimension size and substantially increasing the level of local detail included in the prompt embeds.

Kolors support is almost as deep as SDXL, minus ControlNet training support.

Legacy Stable Diffusion models

RunwayML's SD 1.5 and StabilityAI's SD 2.x are both trainable under the legacy designation.


Hardware Requirements

NVIDIA

Pretty much anything 3080 and up is a safe bet. YMMV.

AMD

LoRA and full-rank tuning are verified working on a 7900 XTX 24GB and MI300X.

Lacking xformers, it will use more memory than Nvidia equivalent hardware.

Apple

LoRA and full-rank tuning are tested to work on an M3 Max with 128G memory, taking about 12G of "Wired" memory and 4G of system memory for SDXL.

Flux.1 [dev, schnell]

Flux prefers being trained with multiple large GPUs but a single 16G card should be able to do it with quantisation of the transformer and text encoders.

SDXL, 1024px

Stable Diffusion 2.x, 768px

Toolkit

For more information about the associated toolkit distributed with SimpleTuner, refer to the toolkit documentation.

Setup

Detailed setup information is available in the installation documentation.

Troubleshooting

Enable debug logs for a more detailed insight by adding export SIMPLETUNER_LOG_LEVEL=DEBUG to your environment (config/config.env) file.

For performance analysis of the training loop, setting SIMPLETUNER_TRAINING_LOOP_LOG_LEVEL=DEBUG will have timestamps that highlight any issues in your configuration.

For a comprehensive list of options available, consult this documentation.

Discord

For more help or to discuss training with like-minded folks, join our Discord server