Home

Awesome

Diffusion Models already have a Semantic Latent Space (ICLR2023 notable-top-25%)

arXiv project_page

Diffusion Models already have a Semantic Latent Space<br> Mingi Kwon, Jaeseok Jeong, Youngjung Uh <br> Arxiv preprint.

Abstract: <br> Diffusion models achieve outstanding generative performance in various domains. Despite their great success, they lack semantic latent space which is essential for controlling the generative process. To address the problem, we propose asymmetric reverse process (Asyrp) which discovers the semantic latent space in frozen pretrained diffusion models. Our semantic latent space, named h-space, has nice properties for accommodating semantic image manipulation: homogeneity, linearity, robustness, and consistency across timesteps. In addition, we introduce a principled design of the generative process for versatile editing and quality boosting by quantifiable measures: editing strength of an interval and quality deficiency at a timestep. Our method is applicable to various architectures (DDPM++, iDDPM, and ADM) and datasets (CelebA-HQ, AFHQ-dog, LSUN-church, LSUN-bedroom, and METFACES).

Description

This repo includes the official Pytorch implementation of Asyrp: Diffusion Models already have a Semantic Latent Space.

image image image

Edited real images (Top) as Happy dog (Bottom). So cute!!

Getting Started

We recommend running our code using NVIDIA GPU + CUDA, CuDNN.

Pretrained Models for Asyrp

Asyrp works on the checkpoints of pretrained diffusion models.

Image Type to EditSizePretrained ModelDatasetReference Repo.
Human face256×256Diffusion (Auto)CelebA-HQSDEdit
Human face256×256DiffusionCelebA-HQP2 weighting
Human face256×256DiffusionFFHQP2 weighting
Church256×256Diffusion (Auto)LSUN-BedroomSDEdit
Bedroom256×256Diffusion (Auto)LSUN-ChurchSDEdit
Dog face256×256DiffusionAFHQ-DogILVR
Painting face256×256DiffusionMETFACESP2 weighting
ImageNet256x256DiffusionImageNetGuided Diffusion

Datasets

To precompute latents and find the direction of h-space, you need about 100+ images in the dataset. You can use both sampled images from the pretrained models or real images from the pretraining dataset.

If you want to use real images, check the URLs :

You can simply modify ./configs/paths_config.py for dataset path.

CUSTOM Datasets

If you want to use a custom dataset, you can use the config/custom.yml file.

--custom_train_dataset_dir "your/costom/dataset/dir/train"    \
--custom_test_dataset_dir "your/costom/dataset/dir/test"      \

Get LPIPS distance

We provide precomputed LPIPS distances for CelebA_HQ, LSUN-Bedroom, LSUN-Church, AFHQ-Dog, and METFACES in the ./utils.

If you want to use the custom/other dataset, we recommand to precompute LPIPS distance.

To precompute LPIPS distance for automatically defined t_edit & t_boost, run the following commands using script_get_lpips.sh.

python main.py  --lpips                  \
                --config $config         \
                --exp ./runs/tmp         \
                --edit_attr test         \
                --n_train_img 100        \
                --n_inv_step 1000   

Asyrp

To train the implicit function f, you can prepare two optional things. 1) get LPIPS distances 2) precompute

We alredy provide precomputed LPIPS distances for CelebA_HQ, LSUN-Bedroom, LSUN-Church, AFHQ-Dog, and METFACES in the ./utils.

If you want to use your own defined-t_edit (e.g., 500) and defined-t_boost (e.g., 200), you don't need to get LPIPS distances.

For that case, you can can use the below arguments:

--user_defined_t_edit 500       \
--user_defined_t_addnoise 200   \

If you want to train with sampled images, you don't need to precompute real images. For that case you can use the below argument:

--load_random_noise

Precompute real images

To precompute real images for saving time, run the follwing commands using script_precompute.sh.

python main.py  --run_train          \
                --config $config     \
                --exp ./runs/tmp     \
                --edit_attr test     \
                --do_train 1         \
                --do_test 1          \
                --n_train_img 100    \
                --n_test_img 32      \
                --bs_train 1         \
                --n_inv_step 50      \
                --n_train_step 50    \
                --n_test_step 50     \
                --just_precompute    

Train the implicit function

To train the implicit function, run the following commands using script_train.sh

python main.py  --run_train                    \
                --config $config               \
                --exp ./runs/example           \
                --edit_attr $guid              \
                --do_train 1                   \
                --do_test 1                    \
                --n_train_img 100              \
                --n_test_img 32                \
                --n_iter 5                     \
                --bs_train 1                   \
                --t_0 999                      \
                --n_inv_step 50                \
                --n_train_step 50              \
                --n_test_step 50               \
                --get_h_num 1                  \
                --train_delta_block            \
                --save_x0                      \
                --use_x0_tensor                \
                --lr_training 0.5              \
                --clip_loss_w 1.0              \
                --l1_loss_w 3.0                \
                --add_noise_from_xt            \
                --lpips_addnoise_th 1.2        \
                --lpips_edit_th 0.33           \
                --sh_file_name $sh_file_name   \

                (optional - if you pass "get LPIPS")
                --user_defined_t_edit 500      \
                --user_defined_t_addnoise 200  \

                (optional - if you pass "precompute")
                --load_random_noise

Inference

After training finished, you can inference with various settings using script_inference.sh. We provide some of it.

python main.py  --run_test                    \
                --config $config              \
                --exp ./runs/example          \
                --edit_attr $guid             \
                --do_train 1                  \
                --do_test 1                   \
                --n_train_img 100             \
                --n_test_img 32               \
                --n_iter 5                    \
                --bs_train 1                  \
                --t_0 999                     \
                --n_inv_step 50               \
                --n_train_step 50             \
                --n_test_step $test_step      \
                --get_h_num 1                 \
                --train_delta_block           \
                --add_noise_from_xt           \
                --lpips_addnoise_th 1.2       \
                --lpips_edit_th 0.33          \
                --sh_file_name $sh_file_name  \
                --save_x0                     \
                --use_x0_tensor               \
                --hs_coeff_delta_h 1.0        \

                (optional - checkpoint)
                --load_from_checkpoint "exp_name"  
                or
                --manual_checkpoint_name "full_path.pth"

                (optional - gradually editing)
                --delta_interpolation
                --max_delta 1.0
                --min_delta -1.0
                --num_delta 10

                (optinal - multi)
                --multiple_attr "exp1 exp2 exp3"
                --multiple_hs_coeff "1 0.5 1.5"

Acknowledge

Codes are based on DiffusionCLIP.