Home

Awesome

<div align="center">

LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis

Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, Dong Liu

[Paper] / [Project] / [Huggingface] / [ModelScope] / [Demo]

</div> <!-- omit in toc -->

Table of Contents

If you have any questions about this work, please feel free to start a new issue or propose a PR.

<!-- omit in toc -->

News

<!-- omit in toc -->

To-Do Lists

<!-- omit in toc -->

Overview of LaCon

teasor

Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process, existing studies, termed as early-constraint methods in this paper, leverage extra conditions and incorporate them into pre-trained diffusion models. Particularly, some of them adopt condition-specific modules to handle conditions separately, where they struggle to generalize across other conditions. Although follow-up studies present unified solutions to solve the generalization problem, they also require extra resources to implement, e.g., additional inputs or parameter optimization, where more flexible and efficient solutions are expected to perform steerable guided image synthesis. In this paper, we present an alternative paradigm, namely Late-Constraint Diffusion (LaCon), to simultaneously integrate various conditions into pre-trained diffusion models. Specifically, LaCon establishes an alignment between the external condition and the internal features of diffusion models, and utilizes the alignment to incorporate the target condition, guiding the sampling process to produce tailored results. Experimental results on COCO dataset illustrate the effectiveness and superior generalization capability of LaCon under various conditions and settings. Ablation studies investigate the functionalities of different components in LaCon, and illustrate its great potential to serve as an efficient solution to offer flexible controllability for diffusion models.

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Code Structure

This GitHub repo is constructed following the code structure below:

LaCon/
└── condition_aligner_src                  <----- Source code of LaCon
    ├── __init__.py
    ├── condition_aligner_dataset.py       <----- Dataset
    ├── condition_aligner_model.py         <----- Model
    └── condition_aligner_runner.py        <----- Runner (train and inference)
├── configs                                <----- Configuration files
├── data-preprocessing                     <----- Code of data pre-processing
├── evaluation-metrics                     <----- Code of evaluation metrics
├── github-materials
├── ldm                                    <----- Source code of LDM (Stable Diffusion)
├── taming                                 <----- Source code of `taming` package
├── tools                                  <----- Code of toolkits to assist data pre-processing
├── README.md
├── condition-aligner-inference.py         <----- Script to reconstruct conditions with the condition aligner
├── condition-aligner-train.py             <----- Script to train condition aligner
├── generate-batch-image.py                <----- Script to generate results in batch
├── generate-single-image.py               <----- Script to generate a single result
└── install.sh                             <----- Bash script to install the virtual environment

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Prerequisites

  1. To install the virtual environment of LaCon, you can execute the following command lines:
conda create -n lacon
conda activate lacon
pip install torch==2.0.0 torchvision==0.15.1
bash install.sh
  1. To prepare the pre-trained model weights of different components in Stable Diffusion as well as our condition aligner, please download the model weights from our Huggingface repo and put them in ./checkpoints. Once the weights are downloaded, modify the configuration files in ./configs. Check this document for more details of modifying configuration files. We strongly recommend you to download the whole Huggingface repo of CLIP locally, in order to avoid the network issue of Huggingface.

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Training of Condition Aligner

  1. We use a subset of the training set COCO with approximate 10,000 data samples. To train the condition aligner, you need to follow the instructions in this document and construct the data in the following structure:
data/
└── bdcn-edges
    ├── 1.png
    ├── 2.png
    ├── ...
└── saliency-masks
    ├── 1.png
    ├── 2.png
    ├── ...
└── color-strokes
    ├── 1.png
    ├── 2.png
    ├── ...
└── coco-captions
    ├── 1.txt
    ├── 2.txt
    ├── ...
└── images
  1. Once the training data is ready, you need to modify the configuration files following this document.
  2. Now you are ready to go by executing the following command line:
python condition-aligner-train.py -b CONFIG_PATH -l OUTPUT_PATH

You can refer to this example command line:

python condition-aligner-train.py -b configs/sd-edge.yaml -l outputs/training/sd-edge

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Sampling with Condition Aligner

Execute the following command line to generate an image with the trained condition aligner:

python generate-single-image.py --cond_type COND_TYPE --indir CONDITION_PATH --resume CONDITION_ALIGNER_PATH --caption TEXT_PROMPT --cond_scale CONTROLLING_SCALE --unconditional_guidance_scale CLASSIFIER_FREE_GUIDANCE_SCALE  --outdir OUTPUT_PATH -b CONFIG_PATH --seed SEED --truncation_steps TRUNCATION_STEPS --use_neg_prompt

You can refer to this example command line:

python generate-single-image.py --cond_type mask --indir examples/horse.png --resume checkpoints/sdv14_mask.pth --caption "a horse standing in the moon surface" --cond_scale 2.0 --unconditional_guidance_scale 6.0  --outdir outputs/ -b configs/sd-mask.yaml --seed 23 --truncation_steps 600 --use_neg_prompt

We suggest the following settings to achieve the optimal performance for various conditions:

ConditionSettingModel WeightControlling ScaleTruncation Steps
Canny EdgeUnconditional Generationsd_celeb_edge.pth2.0500
HED EdgeUnconditional Generationsd_celeb_edge.pth2.0500
User SketchUnconditional Generationsd_celeb_edge.pth2.0600
Color StrokeUnconditional Generationsd_celeb_color.pth2.0600
Image PaletteUnconditional Generationsd_celeb_color.pth2.0800
Canny EdgeT2I Generationsdv14_edge.pth2.0500
HED EdgeT2I Generationsdv14_edge.pth2.5500
User SketchT2I Generationsdv14_edge.pth2.0600
Color StrokeT2I Generationsdv14_color.pth2.0600
Image PaletteT2I Generationsdv14_color.pth2.0800
Saliency MaskT2I Generationsdv14_mask.pth2.0600
User ScribbleT2I Generationsdv14_mask.pth2.0700

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Evaluation

Prepare the test set following the data structure below:

data/
└── bdcn-edges
    ├── 1.png
    ├── 2.png
    ├── ...
└── saliency-masks
    ├── 1.png
    ├── 2.png
    ├── ...
└── color-strokes
    ├── 1.png
    ├── 2.png
    ├── ...
└── image-palette
    ├── 1.png
    ├── 2.png
    ├── ...
└── coco-captions
    ├── 1.txt
    ├── 2.txt
    ├── ...
└── images

Execute the following command line to test all data samples in the test set:

python generate-batch-image.py -b CONFIG_PATH --indir DATA_FILELIST_PATH --text CAPTION_PATH --target_cond CONDITION_PATH --resume CONDITION_ALIGNER_PATH --cond_scale CONTROLLING_SCALE --truncation_steps TRUNCATION_STEPS

You can refer to this example command line:

python generate-batch-image.py -b configs/sd-mask.yaml --indir data/coco2017val/data_flist.txt --text data/coco2017val/coco-captions --target_cond data/coco2017val/saliency-masks --resume checkpoints/sdv14_mask.pth --cond_scale 2.0 --truncation_steps 600

To compute evaluation metrics (e.g., FID and CLIP scores), please refer to this document for more details. We report the performance of LaCon on COCO 2017 validation set in the following table:

ConditionModel WeightFIDCLIP Score
HED Edgesdv14_edge.pth21.020.2590
Color Strokesdv14_color.pth20.270.2589
Image Palettesdv14_color.pth20.610.2580
Saliency Masksdv14_mask.pth20.940.2617

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Results

<details> <summary> We demonstrate results generated by LaCon under various conditions in the following figures. </summary> <div align="center"> Canny Edge </div>

canny-edge

<div align="center"> HED Edge </div>

hed-edge

<div align="center"> User Sketch </div>

user-sketch

<div align="center"> Color Stroke </div>

Color Stroke

<div align="center"> Image Palette </div>

image-palette

<div align="center"> Mask </div>

mask

</details>

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Citation

If you find our paper helpful to your work, please cite our paper with the following BibTeX reference:

@misc{liu-etal-2024-lacon,
      title={{LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis}}, 
      author={{Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, and Dong Liu}},
      year={2024},
      eprint={2305.11520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

<u><small><🎯Back to Table of Contents></small></u>

<!-- omit in toc -->

Stars, Forked, and Star History

Stargazers repo roster for @AlonzoLeeeooo/LCDG

Forkers repo roster for @AlonzoLeeeooo/LCDG

<p align="center"> <a href="https://api.star-history.com/svg?repos=AlonzoLeeeooo/LCDG&type=Date" target="_blank"> <img width="500" src="https://api.star-history.com/svg?repos=AlonzoLeeeooo/LCDG&type=Date" alt="Star History Chart"> </a> <p>

<u><small><🎯Back to Table of Contents></small></u>