Home

Awesome

CSD

Official implementation of the paper Collaborative Score Distillation for Consistent Visual Editing (NeurIPS 2023).

Subin Kim*<sup>1</sup>, Kyungmin Lee*<sup>1</sup>, June Suk Choi<sup>1</sup>, Jongheon Jeong<sup>1</sup>, Kihyuk Sohn<sup>2</sup>, Jinwoo Shin<sup>1</sup>.
<sup>1</sup>KAIST, <sup>2</sup>Google Research
paper | project page | arXiv

TL;DR: Consistent zero-shot visual synthesis across various and complex visual modalities

<p align="center"> <img src=assets/concept_figure.png> </p>

Requirements

Environments

Required packages you should install are listed below:

conda create -n csd python=3.8
conda activate csd
pip install torch==2.0.1 torchvision==0.15.2
pip install diffusers==0.20.0 transformers accelerate mediapy
# for consistency decoder
pip install git+https://github.com/openai/consistencydecoder.git

Image Editing

Run the following script with a single GPU.

python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/river.jpg' \
--batch=4 --tgt_prompt='turn into van gogh style painting' \
--guidance_scale=7.5 --image_guidance_scale=5
<p align="center"> <img src=assets/river_vangogh.png> </p>
python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/sheeps.jpg' \
--batch=4 --tgt_prompt='turn the sheeps into wolves' \
--guidance_scale=7.5 --image_guidance_scale=5 
<p align="center"> <img src=assets/sheep_wolves.png> </p>

To edit image of high resolution, we encode and decode in patch-wise. To do that, add '--stride_vae':

python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/michelangelo.jpeg' \
--batch=8 --tgt_prompt='Re-imagine people are in galaxy' \
--guidance_scale=15 --image_guidance_scale=5 --stride_vae --lr=4.0
<p align="center"> <img src=assets/michelangelo_galaxy.png> </p>

Compositional Image Editing

To edit the image with region-wise prompts while ensuring smooth transitions between patches with different instructions, do the following:

python csdedit_image_region.py --device 0 --svgd --fp16 \
--save_path 'output/' --data_path 'data/vienna.jpg' \
--tgt_prompt 'turn into sunny weather' 'turn into cloudy weather' 'turn into rainy weather' 'turn into snowy weather' \
--stride 16 --batch 4 --guidance_scale 15 --image_guidance_scale 5
<p align="center"> <img src=assets/region_vienna.png> </p>

Video Editing

python csdedit_video.py --device 0 --svgd --fp16 \
--save_path 'output/break/' --data_path 'data/break' \
--tgt_prompt="Change the color of his T-shirt to yellow" \
--guidance_scale=7.5 --image_guidance_scale=1.5 --lr=0.5 \
--rows 2 --cols 12 --svgd --num_steps 100 
<p align="center"> <img src=assets/break/outputs.gif width="500"> </p>

3D Scene Editing

One can obtain 3D editing results by following the codebase of Instruct-NeRF2NeRF but with a few lines of adaptation, particularly for this line into that of CSD-Edit.

Citation

@inproceedings{
    kim2023collaborative,
    title={Collaborative score distillation for consistent visual editing},
    author={Kim, Subin and Lee, Kyungmin and Choi, June Suk and Jeong, Jongheon and Sohn, Kihyuk and Shin, Jinwoo},
    booktitle={Advances in Neural Information Processing Systems},
    year={2023},
}