Home

Awesome

Block-removed Knowledge-distilled Stable Diffusion

Official codebase for BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion [ArXiv] [ECCV 2024].

BK-SDMs are lightweight text-to-image (T2I) synthesis models:

⚡Quick Links: KD Pretraining | Evaluation on MS-COCO | DreamBooth Finetuning | Demo

Notice

Model Description

Installation

conda create -n bk-sdm python=3.8
conda activate bk-sdm
git clone https://github.com/Nota-NetsPresso/BK-SDM.git
cd BK-SDM
pip install -r requirements.txt

Note on the torch versions we've used:

Minimal Example with 🤗Diffusers

With the default PNDM scheduler and 50 denoising steps:

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("nota-ai/bk-sdm-small", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a golden vase with different flowers"
image = pipe(prompt).images[0]  
    
image.save("example.png")
<details> <summary>An equivalent code (modifying solely the U-Net of SD-v1.4 while preserving its Text Encoder and Image Decoder):</summary>
import torch
from diffusers import StableDiffusionPipeline, UNet2DConditionModel

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet = UNet2DConditionModel.from_pretrained("nota-ai/bk-sdm-small", subfolder="unet", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a golden vase with different flowers"
image = pipe(prompt).images[0]  
    
image.save("example.png")
</details>

Distillation Pretraining

Our code was based on train_text_to_image.py of Diffusers 0.15.0. To access the latest version, use this link.

[Optional] Toy to check runnability

bash scripts/get_laion_data.sh preprocessed_11k
bash scripts/kd_train_toy.sh
<details> <summary>Note</summary> </details>

Single-gpu training for BK-SDM-{Base, Small, Tiny}

bash scripts/get_laion_data.sh preprocessed_212k
bash scripts/kd_train.sh
<details> <summary>Note</summary> </details>

Single-gpu training for BK-SDM-{Base-2M, Small-2M, Tiny-2M}

bash scripts/get_laion_data.sh preprocessed_2256k
bash scripts/kd_train_2m.sh
<details> <summary>Note</summary> </details>

Multi-gpu training

bash scripts/kd_train_toy_ddp.sh
<details> <summary>Note</summary> </details>

Compression of SD-v2 with BK-SDM

bash scripts/kd_train_v2-base-im512.sh
bash scripts/kd_train_v2-im768.sh

# For inference, see: 'scripts/generate_with_trained_unet.sh'  

Note on training code

<details> <summary> Key segments for KD training </summary> </details> <details> <summary> Key learning hyperparams </summary>
--unet_config_name "bk_small" # option: ["bk_base", "bk_small", "bk_tiny"]
--use_copy_weight_from_teacher # initialize student unet with teacher weights
--learning_rate 5e-05
--train_batch_size 64
--gradient_accumulation_steps 4
--lambda_sd 1.0
--lambda_kd_output 1.0
--lambda_kd_feat 1.0
</details>

Evaluation on MS-COCO Benchmark

We used the following codes to obtain the results on MS-COCO. After generating 512×512 images with the PNDM scheduler and 25 denoising steps, we downsampled them to 256×256 for computing scores.

Generation with released models (using BK-SDM-Small as default)

On a single 3090 GPU, '(2)' takes ~10 hours per model, and '(3)' takes a few minutes.

[After training] Generation with a trained U-Net

bash scripts/get_mscoco_files.sh
bash scripts/generate_with_trained_unet.sh

Results on Zero-shot MS-COCO 256×256 30K

See Results in MODEL_CARD.md

DreamBooth Finetuning with 🤗PEFT

Our lightweight SD backbones can be used for efficient personalized generation. DreamBooth refines text-to-image diffusion models given a small number of images. DreamBooth+LoRA can drastically reduce finetuning cost.

DreamBooth dataset

The dataset is downloaded at ./data/dreambooth/dataset [folder tree]: 30 subjects × 25 prompts × 4∼6 images.

git clone https://github.com/google/dreambooth ./data/dreambooth

DreamBooth finetuning (using BK-SDM-Base as default)

Our code was based on train_dreambooth.py of PEFT 0.1.0. To access the latest version, use this link.

Results of Personalized Generation

See DreamBooth Results in MODEL_CARD.md

Gradio Demo

Check out our Gradio demo and the codes (main: app.py)! <details> <summary> [Aug/01/2023] featured in Hugging Face Spaces of the week 🔥 </summary> <img alt="Spaces of the week" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/screenshot_spaces_of_the_week.png" width="100%"> </details>

Core ML Weights

For iOS or macOS applications, we have converted our models to Core ML format. They are available at 🤗Hugging Face Models (nota-ai/coreml-bk-sdm) and can be used with Apple's Core ML Stable Diffusion library.

License

This project, along with its weights, is subject to the CreativeML Open RAIL-M license, which aims to mitigate any potential negative effects arising from the use of highly advanced machine learning systems. A summary of this license is as follows.

1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content,
2. We claim no rights on the outputs you generate, you are free to use them and are accountable for their use which should not go against the provisions set in the license, and
3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users.

Acknowledgments

Citation

@article{kim2023bksdm,
  title={BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion},
  author={Kim, Bo-Kyeong and Song, Hyoung-Kyu and Castells, Thibault and Choi, Shinkook},
  journal={arXiv preprint arXiv:2305.15798},
  year={2023},
  url={https://arxiv.org/abs/2305.15798}
}