Home

Awesome

SINE <br><sub> <ins>SIN</ins>gle Image <ins>E</ins>diting with Text-to-Image Diffusion Models</sub>

Colab

Project | ArXiv

This respository contains the code for the CVPR 2023 paper SINE: SINgle Image Editing with Text-to-Image Diffusion Models. For more visualization results, please check our webpage.

SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang <sup>1</sup>, Ligong Han <sup>1</sup>, Arnab Ghosh <sup>2</sup>, Dimitris Metaxas <sup>1</sup>, and Jian Ren <sup>2</sup>
<sup>1</sup> Rutgers University <sup>2</sup> Snap Inc.
CVPR 2023.

<div align="center"> <a><img src="assets/overview_finetuning.png" width="500" ></a> <a><img src="assets/overview_editing.png" width="500" ></a> </div>

Setup

First, clone the repository and install the dependencies:

git clone git@github.com:zhang-zx/SINE.git

Then, install the dependencies following the instructions.

Alternatively, you can also try to use the following docker image.

docker pull sunggukcha/sine

To fine-tune the model, you need to download the pre-trained model.

Data Preparation

The data we use in the paper can be found from here.

Fine-tuning

Fine-tuning w/o patch-based training scheme

IMG_PATH=path/to/image
CLS_WRD='coarse class word'
NAME='name of the experiment'

python main.py \
    --base configs/stable-diffusion/v1-finetune_picture.yaml \
    -t --actual_resume /path/to/pre-trained/model \
    -n $NAME --gpus 0,  --logdir ./logs \
    --data_root $IMG_PATH \
    --reg_data_root $IMG_PATH --class_word $CLS_WRD 

Fine-tuning with patch-based training scheme

IMG_PATH=path/to/image
CLS_WRD='coarse class word'
NAME='name of the experiment'

python main.py \
    --base configs/stable-diffusion/v1-finetune_patch_picture.yaml \
    -t --actual_resume /path/to/pre-trained/model \
    -n $NAME --gpus 0,   --logdir ./logs \
    --data_root $IMG_PATH \
    --reg_data_root $IMG_PATH --class_word $CLS_WRD  

Model-based Image Editing

Editing with one model's guidance

LOG_DIR=/path/to/logdir
python scripts/stable_txt2img_guidance.py --ddim_eta 0.0 --n_iter 1 \
    --scale 10 --ddim_steps 100 \
    --sin_config configs/stable-diffusion/v1-inference.yaml \
    --sin_ckpt $LOG_DIR"/checkpoints/last.ckpt" \
    --prompt "prompt for pre-trained model[SEP]prompt for fine-tuned model" \
    --cond_beta 0.4 \
    --range_t_min 500 --range_t_max 1000 --single_guidance \
    --skip_save --H 512 --W 512 --n_samples 2 \
    --outdir $LOG_DIR

Editing with multiple models' guidance

python scripts/stable_txt2img_multi_guidance.py --ddim_eta 0.0 --n_iter 2 \
    --scale 10 --ddim_steps 100 \
    --sin_ckpt path/to/ckpt1 path/to/ckpt2 \
    --sin_config ./configs/stable-diffusion/v1-inference.yaml \
    configs/stable-diffusion/v1-inference.yaml \
    --prompt "prompt for pre-trained model[SEP]prompt for fine-tuned model1[SEP]prompt for fine-tuned model2" \
    --beta 0.4 0.5 \
    --range_t_min 400 400 --range_t_max 1000 1000 --single_guidance \
    --H 512 --W 512 --n_samples 2 \
    --outdir path/to/output_dir

Diffusers library Example

The Diffusers Library support is still under development. Results in our paper are obtained using previous code based on LDM.

Training

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export IMG_PATH="path/to/image"
export OUTPUT_DIR="path/to/output_dir"

accelerate launch diffusers_train.py  \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --train_text_encoder \
  --img_path=$IMG_PATH \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="prompt for fine-tuning" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=NUMBERS_OF_STEPS \
  --checkpointing_steps=FREQUENCY_FOR_CHECKPOINTING \
  --patch_based_training # OPTIONAL: add this flag for patch-based training scheme

Sampling


python diffusers_sample.py \
--pretrained_model_name_or_path "path/to/output_dir" \
--prompt "prompt for fine-tuned model" \
--editing_prompt 'prompt for pre-trained model' 

Visualization Results

Some of the editing results are shown below. See more results on our webpage.

image

Acknowledgments

In this code we refer to the following implementations: Dreambooth-Stable-Diffusion and stable-diffusion. Implementation with the Diffusers Library support is highly based on Dreambooth. Great thanks to them!

Reference

If our work or code helps you, please consider to cite our paper. Thank you!

@article{zhang2022sine,
  title={SINE: SINgle Image Editing with Text-to-Image Diffusion Models},
  author={Zhang, Zhixing and Han, Ligong and Ghosh, Arnab and Metaxas, Dimitris and Ren, Jian},
  journal={arXiv preprint arXiv:2212.04489},
  year={2022}
}