Awesome
Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
<a href='http://arxiv.org/abs/2403.11105'><img src='https://img.shields.io/badge/arXiv-2403.11105-b31b1b.svg'></a>
Ruibin Li<sup>1</sup> | Ruihuang Li<sup>1</sup> |Song Guo<sup>2</sup> | Lei Zhang<sup>1*</sup> <br> <sup>1</sup>The Hong Kong Polytechnic University, <sup>2</sup>The Hong Kong University of Science and Technology. <br> In ECCV2024
🔎 Overview framework
Pipelines of different inversion methods in text-driven editing. (a) DDIM inversion inverts a real image to a latent noise code, but the inverted noise code often results in large gap of reconstruction $D_{Rec}$ with higher CFG parameters. (b) NTI optimizes the null-text embedding to narrow the gap of reconstruction $D_{Rec}$, while NPI further optimizes the speed of NTI. (c) DirectInv records the differences between the inversion feature and the reconstruction feature, and merges them back to achieve high-quality reconstruction. (d) Our SPDInv aims to minimize the gap of noise $D_{Noi}$, instead of $D_{Rec}$, which can reduce the impact of source prompt on the editing process and thus reduce the artifacts and inconsistent details encountered by the previous methods.
⚙️ Dependencies and Installation
## git clone this repository
git clone https://github.com/leeruibin/SPDInv.git
cd SPDInv
# create an environment with python >= 3.8
conda env create -f environment.yaml
conda activate SPDInv
🚀 Quick Inference
Run P2P with SPDInv
python run_SPDInv_P2P.py --input xxx --source [source prompt] --target [target prompt] --blended_word "word1 word2"
Run MasaCtrl with SPDInv
python run_SPDInv_MasaCtrl.py --input xxx --source [source prompt] --target [target prompt]
Run PNP with SPDInv
To run PNP, you should first upgrade diffusers to 0.17.1 by
pip install diffusers==0.17.1
then, you can run
python run_SPDInv_PNP.py --input xxx --source [source prompt] --target [target prompt]
Run ELITE with SPDInv
For ELITE, you should first download the pre-trained global_mapper.pt checkpoint provided by the ELITE, put it into the checkpoints folder.
python run_SPDInv_ELITE.py --input xxx --source [source prompt] --target [target prompt]
📷 Editing cases with P2P, MasaCtrl, PNP, ELITE
Editing cases with P2P
<div align="center"> <img src="./figures/cases_P2P.jpg" width = "600" alt="P2P" align=center /> </div>Editing cases with MasaCtrl
<div align="center"> <img src="./figures/cases_MasaCtrl.jpg" width = "600" alt="MasaCtrl" align=center /> </div>Editing cases with PNP
<div align="center"> <img src="./figures/cases_PNP.jpg" width = "600" alt="PNP" align=center /> </div>Editing cases with ELITE
<div align="center"> <img src="./figures/cases_ELITE.jpg" width = "600" alt="ELITE" align=center /> </div>Citation
@article{li2024source,
title={Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models},
author={Li, Ruibin and Li, Ruihuang and Guo, Song and Zhang, Lei},
booktitle={European Conference on Computer Vision},
year={2024}
}
Acknowledgements
This code is built on diffusers version of Stable Diffusion.
Meanwhile, the code is heavily based on the Prompt-to-Prompt, Null-Text Inversion, MasaCtrl, ProxEdit, ELITE, Plug-and-Play, DirectInversion, thanks to all the contributors!.