Awesome

LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis

Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, Dong Liu

[Paper] / [Project] / [Huggingface] / [ModelScope] / [Demo]

</div>

<u>1. News</u>
<u>2. To-Do Lists</u>
<u>3. Overview of LaCon</u>
<u>4. Code Structure</u>
<u>5. Prerequisites</u>
<u>6. Training of Condition Aligner</u>
<u>7. Sampling with Condition Aligner</u>
<u>8. Evaluation</u>
<u>9. Results</u>
<u>10. Citation</u>
<u>11. Stars, Forked, and Star History</u>

If you have any questions about this work, please feel free to start a new issue or propose a PR.

News

[Jun. 12th] We have updated the training and sampling code of LaCon. Pre-trained model weights are currently available at our Huggingface repo and ModelScope repo.

To-Do Lists

Upload a newer version of paper to arXiv
Update the codebase
Update the repo document
Upload the pre-trained model weights of LaCon based on Celeb and Stable Diffusion v1.4
Update the pre-trained model weights of LaCon based on Stable Diffusion v2.1
Update implementation for local Gradio demo
Update online HuggingFace demo

Overview of LaCon

teasor

Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process, existing studies, termed as early-constraint methods in this paper, leverage extra conditions and incorporate them into pre-trained diffusion models. Particularly, some of them adopt condition-specific modules to handle conditions separately, where they struggle to generalize across other conditions. Although follow-up studies present unified solutions to solve the generalization problem, they also require extra resources to implement, e.g., additional inputs or parameter optimization, where more flexible and efficient solutions are expected to perform steerable guided image synthesis. In this paper, we present an alternative paradigm, namely Late-Constraint Diffusion (LaCon), to simultaneously integrate various conditions into pre-trained diffusion models. Specifically, LaCon establishes an alignment between the external condition and the internal features of diffusion models, and utilizes the alignment to incorporate the target condition, guiding the sampling process to produce tailored results. Experimental results on COCO dataset illustrate the effectiveness and superior generalization capability of LaCon under various conditions and settings. Ablation studies investigate the functionalities of different components in LaCon, and illustrate its great potential to serve as an efficient solution to offer flexible controllability for diffusion models.

Code Structure

This GitHub repo is constructed following the code structure below:

LaCon/
└── condition_aligner_src                  <----- Source code of LaCon
    ├── __init__.py
    ├── condition_aligner_dataset.py       <----- Dataset
    ├── condition_aligner_model.py         <----- Model
    └── condition_aligner_runner.py        <----- Runner (train and inference)
├── configs                                <----- Configuration files
├── data-preprocessing                     <----- Code of data pre-processing
├── evaluation-metrics                     <----- Code of evaluation metrics
├── github-materials
├── ldm                                    <----- Source code of LDM (Stable Diffusion)
├── taming                                 <----- Source code of `taming` package
├── tools                                  <----- Code of toolkits to assist data pre-processing
├── README.md
├── condition-aligner-inference.py         <----- Script to reconstruct conditions with the condition aligner
├── condition-aligner-train.py             <----- Script to train condition aligner
├── generate-batch-image.py                <----- Script to generate results in batch
├── generate-single-image.py               <----- Script to generate a single result
└── install.sh                             <----- Bash script to install the virtual environment

Prerequisites

To install the virtual environment of LaCon, you can execute the following command lines:

conda create -n lacon
conda activate lacon
pip install torch==2.0.0 torchvision==0.15.1
bash install.sh

To prepare the pre-trained model weights of different components in Stable Diffusion as well as our condition aligner, please download the model weights from our Huggingface repo and put them in ./checkpoints. Once the weights are downloaded, modify the configuration files in ./configs. Check this document for more details of modifying configuration files. We strongly recommend you to download the whole Huggingface repo of CLIP locally, in order to avoid the network issue of Huggingface.

Training of Condition Aligner

We use a subset of the training set COCO with approximate 10,000 data samples. To train the condition aligner, you need to follow the instructions in this document and construct the data in the following structure:

data/
└── bdcn-edges
    ├── 1.png
    ├── 2.png
    ├── ...
└── saliency-masks
    ├── 1.png
    ├── 2.png
    ├── ...
└── color-strokes
    ├── 1.png
    ├── 2.png
    ├── ...
└── coco-captions
    ├── 1.txt
    ├── 2.txt
    ├── ...
└── images

Once the training data is ready, you need to modify the configuration files following this document.
Now you are ready to go by executing the following command line:

python condition-aligner-train.py -b CONFIG_PATH -l OUTPUT_PATH

You can refer to this example command line:

python condition-aligner-train.py -b configs/sd-edge.yaml -l outputs/training/sd-edge

Sampling with Condition Aligner

Execute the following command line to generate an image with the trained condition aligner:

python generate-single-image.py --cond_type COND_TYPE --indir CONDITION_PATH --resume CONDITION_ALIGNER_PATH --caption TEXT_PROMPT --cond_scale CONTROLLING_SCALE --unconditional_guidance_scale CLASSIFIER_FREE_GUIDANCE_SCALE  --outdir OUTPUT_PATH -b CONFIG_PATH --seed SEED --truncation_steps TRUNCATION_STEPS --use_neg_prompt

You can refer to this example command line:

python generate-single-image.py --cond_type mask --indir examples/horse.png --resume checkpoints/sdv14_mask.pth --caption "a horse standing in the moon surface" --cond_scale 2.0 --unconditional_guidance_scale 6.0  --outdir outputs/ -b configs/sd-mask.yaml --seed 23 --truncation_steps 600 --use_neg_prompt

We suggest the following settings to achieve the optimal performance for various conditions:

Condition	Setting	Model Weight	Controlling Scale	Truncation Steps
Canny Edge	Unconditional Generation	`sd_celeb_edge.pth`	2.0	500
HED Edge	Unconditional Generation	`sd_celeb_edge.pth`	2.0	500
User Sketch	Unconditional Generation	`sd_celeb_edge.pth`	2.0	600
Color Stroke	Unconditional Generation	`sd_celeb_color.pth`	2.0	600
Image Palette	Unconditional Generation	`sd_celeb_color.pth`	2.0	800
Canny Edge	T2I Generation	`sdv14_edge.pth`	2.0	500
HED Edge	T2I Generation	`sdv14_edge.pth`	2.5	500
User Sketch	T2I Generation	`sdv14_edge.pth`	2.0	600
Color Stroke	T2I Generation	`sdv14_color.pth`	2.0	600
Image Palette	T2I Generation	`sdv14_color.pth`	2.0	800
Saliency Mask	T2I Generation	`sdv14_mask.pth`	2.0	600
User Scribble	T2I Generation	`sdv14_mask.pth`	2.0	700

Evaluation

Prepare the test set following the data structure below:

data/
└── bdcn-edges
    ├── 1.png
    ├── 2.png
    ├── ...
└── saliency-masks
    ├── 1.png
    ├── 2.png
    ├── ...
└── color-strokes
    ├── 1.png
    ├── 2.png
    ├── ...
└── image-palette
    ├── 1.png
    ├── 2.png
    ├── ...
└── coco-captions
    ├── 1.txt
    ├── 2.txt
    ├── ...
└── images

Execute the following command line to test all data samples in the test set:

python generate-batch-image.py -b CONFIG_PATH --indir DATA_FILELIST_PATH --text CAPTION_PATH --target_cond CONDITION_PATH --resume CONDITION_ALIGNER_PATH --cond_scale CONTROLLING_SCALE --truncation_steps TRUNCATION_STEPS

You can refer to this example command line:

python generate-batch-image.py -b configs/sd-mask.yaml --indir data/coco2017val/data_flist.txt --text data/coco2017val/coco-captions --target_cond data/coco2017val/saliency-masks --resume checkpoints/sdv14_mask.pth --cond_scale 2.0 --truncation_steps 600

To compute evaluation metrics (e.g., FID and CLIP scores), please refer to this document for more details. We report the performance of LaCon on COCO 2017 validation set in the following table:

Condition	Model Weight	FID	CLIP Score
HED Edge	`sdv14_edge.pth`	21.02	0.2590
Color Stroke	`sdv14_color.pth`	20.27	0.2589
Image Palette	`sdv14_color.pth`	20.61	0.2580
Saliency Mask	`sdv14_mask.pth`	20.94	0.2617

Results

<details> <summary> We demonstrate results generated by LaCon under various conditions in the following figures. </summary> <div align="center"> Canny Edge </div>

canny-edge

hed-edge

<div align="center"> User Sketch </div>

user-sketch

<div align="center"> Color Stroke </div>

Color Stroke

<div align="center"> Image Palette </div>

image-palette

mask

</details>

Citation

If you find our paper helpful to your work, please cite our paper with the following BibTeX reference:

@misc{liu-etal-2024-lacon,
      title={{LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis}}, 
      author={{Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, and Dong Liu}},
      year={2024},
      eprint={2305.11520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Stars, Forked, and Star History

Awesome

LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis

Table of Contents

News

To-Do Lists

Overview of LaCon

Code Structure

Prerequisites

Training of Condition Aligner

Sampling with Condition Aligner

Evaluation

Results

Citation

Stars, Forked, and Star History