Awesome

DiffEdit with Stable Diffusion

Unofficial implementation of “DiffEdit: Diffusion-based semantic image editing with mask guidance” with Stable Diffusion, for better sample efficiency, we use DPM-solver, as sample method.

Paper: https://arxiv.org/abs/2210.11427

paper

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

You can also update an existing latent diffusion environment by running

conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

usage

just run diffedit.ipynb using jupyter notebook.

important parameters:

encode_ratio: float = 0.6
# encode_ratio indicate how noisy the img is add with, if ratio is near zero, the origin img is likely to return, if ratio is near 1.0, it may casue some problem
clamp_rate: float = 4
# the map value will be clamped to map.mean() * clamp_rate, then values will be scaled into 0~1, then term into binary(split at 0.5). so if a map value is large than map.mean() * clamp_rate * 0.5 will be encode to 1, less will be encode to 0. 
# so the larger clamp rate is, less pixes will be encode to 1, the small clamp rate is, the more pixes will be encode to 1.
ddim_steps: int = 15
# for dpm-solver, steps do not need be too large
# encourage to use other parameter(like order, predict_x0) of dpm-solver

results

A bowl of fruits	generated mask	A bowl of pears