Awesome
IterInpaint
The code for IterInpaint, a new baseline for layout-guided image generation, as described in the paper:
Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal
[Project Page] [Paper] [Gradio Demo ] [Colab Demo ]
<img src="./assets/iterinpaint_inference.png" width=1000px>Setup Environment
conda create -n iterinpaint python=3.9
conda activate iterinpaint
pip install torch torchvision
pip install -r requirements.txt
🧨 Diffusers support
We provide Huggingface Diffusers checkpoint for IterInpaint, where you can simply load our model as follows:
from diffusers import StableDiffusionInpaintPipeline
# CLEVR checkpoint
pipe = StableDiffusionInpaintPipeline.from_pretrained('j-min/IterInpaint-CLEVR')
# COCO checkpoint
pipe = StableDiffusionInpaintPipeline.from_pretrained('j-min/iterinpaint_sd15inpaint_coco')
Inference Demos
We provide demos for IterInpaint inference, where you can generate images with your own custom layouts.
Gradio
<!-- https://huggingface.co/spaces/j-min/IterInpaint-CLEVR --> <!-- Gradio demo: [https://gradio.app/g/j-min/IterInpaint-CLEVR](https://gradio.app/g/j-min/IterInpaint-CLEVR) --> <!-- # add gradio icon -->Notebooks
Inference with Diffusers - You can run this notebook on Colab.
Inference with original LDM codebase - You need 12GB+ CPU memory to build model (you would need Colab Pro).
Training IterInpaint on CLEVR
<img src="./assets/iterinpaint_training.png" width=1000px>We provide pretrained checkpoints for IterInpaint on CLEVR.
- Original checkpoint: https://huggingface.co/j-min/IterInpaint-CLEVR-original
- Diffusers version: https://huggingface.co/j-min/IterInpaint-CLEVR
Below, we provide the instructions for training IterInpaint on CLEVR.
1) Download SD checkpoint
mkdir preload_model_checkpoints
cd preload_model_checkpoints
# By default, we use SD v1.5 inpainting checkpoint as starting point (https://huggingface.co/runwayml/stable-diffusion-inpainting).
wget https://huggingface.co/runwayml/stable-diffusion-inpainting/resolve/main/sd-v1-5-inpainting.ckpt
# You can replace it with other checkpoint, such as SD text2image from https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
# wget https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4-full-ema.ckpt
Run Training
We train IterInpaint on 16 V100 GPUs (2 nodes x 8 GPUs at each node) with batch size 1 per GPU, gradient accumulation of 8. We train IterInpaint for 20K steps with the effective batch size of 128 (=16 x 8).
We update U-Net and CLIP text encoder parameters, while freezeing the autoencoder.
You can check and change other hyperparameters in the config file (configs/stable-diffusion/v1-finetune_clevr_iterinpaint_SD15.yaml).
config='configs/stable-diffusion/v1-finetune_clevr_iterinpaint_SD15.yaml'
SD_starting_checkpoint='preload_model_checkpoints/sd-v1-5-inpainting.ckpt'
data_root='datasets/clevr_data'
lr=1e-4
fg_task_ratio='030'
job_name='iterinpaint_CLEVR_FG30'
save_dir='results'
batch_size=1
python main.py
--base $config
--train
--nodes 2
--gpus 0,1,2,3,4,5,6,7
--actual_resume $SD_starting_checkpoint
--name $job_name
--data_root $data_root
--val_data_root $data_root
--no-test true
--lr $lr
--batch_size $batch_size
--logdir $save_dir/$job_name
--fg_task_ratio $fg_task_ratio
--seed 42
(optional) Convert LDM-based checkpoint to HF diffusers format
# checkpoint output path from training
ckpt_path=xxxx.ckpt
config_file=xxxx.project.yaml
# output path for HF diffusers checkpoint
dump_path=DUMP_PATH
python convert_iterinpaint_ldm_checkpoint_to_diffusers.py \
--checkpoint_path $ckpt_path \
--original_config_file $config_file \
--image_size 512 \
--prediction_type 'epsilon' \
--pipeline_type 'FrozenCLIPEmbedder' \
--extract_ema \
--dump_path $dump_path
CLEVR inference
dump_dir='eval_images_dump/clevr'
config='configs/stable-diffusion/v1-inference-iterinpaint.yaml'
run_name='iterinpaint_guidance4.0'
torchrun \
--nnodes=$n_nodes \
--nproc_per_node=$n_gpus \
scripts/clevr_inference.py \
--eval_data 'clevr' \
--plms \
--scale 4.0 \
--config $config \
--ckpt $ckpt_path \
--clevr_dump_dir $dump_dir \
--save_bbox_viz \
--name $run_name
LayoutBench inference
skill_split='number_few' # Change to other skill splits
dump_dir='eval_images_dump/layoutbench'
config='configs/stable-diffusion/v1-inference-iterinpaint.yaml'
run_name='iterinpaint_guidance4.0'
torchrun \
--nnodes=$n_nodes \
--nproc_per_node=_gpus \
scripts/clevr_inference.py \
--eval_data 'layoutbench' \
--plms \
--scale 4.0 \
--config $config \
--ckpt $ckpt_path \
--layoutbench_dump_dir $dump_dir \
--skill_split $skill_split \
--save_bbox_viz \
--name $run_name
Citation
If you find our project useful in your research, please cite the following paper:
@inproceedings{Cho2024LayoutBench,
author = {Jaemin Cho and Linjie Li and Zhengyuan Yang and Zhe Gan and Lijuan Wang and Mohit Bansal},
title = {Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation},
booktitle = {The First Workshop on the Evaluation of Generative Foundation Models},
year = {2024},
}