Awesome
News and ToDo List
- Improve latent space training skills (For fair comparison with previous methods, we train from scratch on COCO-stuff, not finetuned from Stable Diffusion)
- Release the pretrained LayoutDiffusion on latent space !!!COMING SOON!!!
- Improve README and code usage instructions
- Clean up code
- Code for Training on Latent Space using AutoEncoderKL
- Release tools for evaluation
- 2023-04-09: Release pre-trained model
- 2023-04-09: Release instructions for environment and training
- 2023-04-09: Release Gradio Webui Demo
- 2023-03-30: Publish complete code
- 2023-02-27: Accepted by CVPR2023
- 2022-11-11: Submitted to CVPR2023
- 2022-07-08: Publish initial code
Introduction
This repository is the official implementation of CVPR2023: LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation.
The code is heavily based on openai/guided-diffusion, with the following modifications:
- Added support for Distributed Training of PyTorch.
- Added support for OmegaConfig in ./configs for easy control
- Added support for layout-to-image generation by introducing a layout encoder (layout fusion module or LFM) and object-aware cross-attention (OaCA).
Gradio Webui Demo
Pipeline
Visualizations on COCO-stuff
Setup Environment
conda create -y -n LayoutDiffusion python=3.8
conda activate LayoutDiffusion
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda install -y imageio==2.9.0
pip install omegaconf opencv-python h5py==3.2.1 gradio==3.38.0
# try '''pip install -U gradio''' when meeting bugs
pip install -e ./repositories/dpm_solver
python setup.py build develop
Gradio Webui Demo (No need for setup of dataset)
python scripts/launch_gradio_app.py \
--config_file configs/COCO-stuff_256x256/LayoutDiffusion_large.yaml \
sample.pretrained_model_path=./pretrained_models/COCO-stuff_256x256_LayoutDiffusion_large_ema_1150000.pt
add '--share' after '--config_file XXX' to allow for remote link share
Setup Dataset
See here
Pretrained Models
Dataset | Resolution | steps, FID (Sample imgs x times) | Link (TODO) |
---|---|---|---|
COCO-Stuff 2017 segmentation challenge<br/>(deprecated coco-stuff, not full coco-stuff) | 256 x 256 | steps=25 <br/> FID=15.61 ( 3097 x 5 ) <br/> FID=31.68 ( 2048 x 1 ) | Google drive |
COCO-Stuff 2017 segmentation challenge<br/>(deprecated coco-stuff, not full coco-stuff) | 256 x 256 | waiting | Google drive |
COCO-Stuff 2017 segmentation challenge<br/>(deprecated coco-stuff, not full coco-stuff) | 128 x 128 | steps=25 <br/> FID=16.57 ( 3097 x 5 ) | Google drive |
VG | 256 x 256 | steps=25 <br/> FID=15.63 ( 5097 x 1 ) | Google drive |
VG | 128 x 128 | steps=25 <br/> FID=16.35 ( 5097 x 1 ) | Google drive |
Training on Latent Space
- download the first stage model vae-8
cd pretrained_models
git clone https://huggingface.co/stabilityai/sd-vae-ft-ema
cd sd-vae-ft-ema
wget https://huggingface.co/stabilityai/sd-vae-ft-ema/resolve/main/diffusion_pytorch_model.bin -O diffusion_pytorch_model.bin
wget https://huggingface.co/stabilityai/sd-vae-ft-ema/resolve/main/diffusion_pytorch_model.safetensors -O diffusion_pytorch_model.safetensors
pip install --upgrade diffusers[torch]
python -m torch.distributed.launch \
--nproc_per_node 8 \
scripts/image_train_for_layout.py \
--config_file ./configs/COCO-stuff_256x256/latent_LayoutDiffusion_large.yaml
Training on Image Space
python -m torch.distributed.launch \
--nproc_per_node 8 \
scripts/image_train_for_layout.py \
--config_file ./configs/COCO-stuff_256x256/LayoutDiffusion_large.yaml
Sampling
pip install --upgrade diffusers[torch]
- bash/quick_sample.bash for quick sample
- bash/sample.bash for sample entire test dataset
Evaluation
[Important] In each metrics, you should first configure the environment according to the specified repo.
FID
Fr‘echet Inception Distance (FID) were evaluated by using TTUR.
After sampling, using the following command to measure the FID score:
CUDA_VISIBLE_DEVICES=0 python fid.py path/to/generated_imgs path/to/gt_imgs --gpu 0
IS
Inception Score (IS) were evaluated by using Improved-GAN.
After sampling, using the following command to measure the IS:
cd inception_score
CUDA_VISIBLE_DEVICES=0 python model.py --path path/to/generated_imgs
DS
Diversity Score (DS) were evaluated by using PerceptualSimilarity.
We modified lpips_2dirs.py
to make it easier to calculate the mean and variance of DS automatically, please refer this.
After sampling, using the following command to measure the IS:
CUDA_VISIBLE_DEVICES=0 python lpips_2dirs.py -d0 path/to/generated_imgs_0 -d1 path/to/generated_imgs_1 -o imgs/example_dists.txt --use_gpu
YOLO Score
YOLO Score were evaluated by using LAMA.
Since we filter the objects and images in datasets, we think it is better to evaluate bbox mAP only on filtered annotations. So we modified test.py
to measure YOLO Score both on full annotations(using instances_val2017.json
in coco dataset) and filtered annotations.
After sampling, using the following command to measure the YOLO Score:
cd yolo_experiments
cd data
CUDA_VISIBLE_DEVICES=0 python test.py --image_path path/to/generated_imgs
CAS
Classification Score (CAS) were evaluated by using pytorch_image_classification.
We crop the GT box area of images and resize objects at a resolution of 32×32 with their class. Then train a ResNet101 classifier with cropped images on generated images and test it on cropped images on real images. Finally, measuring CAS using the generated images.
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config configs/test.yaml
You should configure the ckpt path and dataset info in configs/test.yaml
.
For beginner
The field of layout-to-image generation is related to scenegraph-to-image generation and remained some confusing issues. You could refer to issues like:
- the deprecated coco-stuff 2017
- FID, IS, LPIPS, CAS of LostGAN-v2
- IS, FID, LPIPS, CAS of Grid2Im
- IS, SceneIS, FID, SceneFID, LPIPS, CAS of AttrLostGAN
However, it is recommended to ignore the confusing history and follow the latest LDM, Frido to work on a relatively new benchmark.
Cite
@InProceedings{Zheng_2023_CVPR,
author = {Zheng, Guangcong and Zhou, Xianpan and Li, Xuewei and Qi, Zhongang and Shan, Ying and Li, Xi},
title = {LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {22490-22499}
}