Awesome

CSD

Official implementation of the paper Collaborative Score Distillation for Consistent Visual Editing (NeurIPS 2023).

Subin Kim*1, Kyungmin Lee*1, June Suk Choi1, Jongheon Jeong1, Kihyuk Sohn2, Jinwoo Shin1.
1KAIST, 2Google Research
paper | project page | arXiv

TL;DR: Consistent zero-shot visual synthesis across various and complex visual modalities

Requirements

Environments

Required packages you should install are listed below:

conda create -n csd python=3.8
conda activate csd
pip install torch==2.0.1 torchvision==0.15.2
pip install diffusers==0.20.0 transformers accelerate mediapy
# for consistency decoder
pip install git+https://github.com/openai/consistencydecoder.git

Image Editing

Run the following script with a single GPU.

python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/river.jpg' \
--batch=4 --tgt_prompt='turn into van gogh style painting' \
--guidance_scale=7.5 --image_guidance_scale=5

python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/sheeps.jpg' \
--batch=4 --tgt_prompt='turn the sheeps into wolves' \
--guidance_scale=7.5 --image_guidance_scale=5

To edit image of high resolution, we encode and decode in patch-wise. To do that, add '--stride_vae':

python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/michelangelo.jpeg' \
--batch=8 --tgt_prompt='Re-imagine people are in galaxy' \
--guidance_scale=15 --image_guidance_scale=5 --stride_vae --lr=4.0

Compositional Image Editing

To edit the image with region-wise prompts while ensuring smooth transitions between patches with different instructions, do the following:

python csdedit_image_region.py --device 0 --svgd --fp16 \
--save_path 'output/' --data_path 'data/vienna.jpg' \
--tgt_prompt 'turn into sunny weather' 'turn into cloudy weather' 'turn into rainy weather' 'turn into snowy weather' \
--stride 16 --batch 4 --guidance_scale 15 --image_guidance_scale 5

Video Editing

python csdedit_video.py --device 0 --svgd --fp16 \
--save_path 'output/break/' --data_path 'data/break' \
--tgt_prompt="Change the color of his T-shirt to yellow" \
--guidance_scale=7.5 --image_guidance_scale=1.5 --lr=0.5 \
--rows 2 --cols 12 --svgd --num_steps 100

3D Scene Editing

One can obtain 3D editing results by following the codebase of Instruct-NeRF2NeRF but with a few lines of adaptation, particularly for this line into that of CSD-Edit.

Citation

@inproceedings{
    kim2023collaborative,
    title={Collaborative score distillation for consistent visual editing},
    author={Kim, Subin and Lee, Kyungmin and Choi, June Suk and Jeong, Jongheon and Sohn, Kihyuk and Shin, Jinwoo},
    booktitle={Advances in Neural Information Processing Systems},
    year={2023},
}