Awesome
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model
Shiyuan Yang<sup>1</sup>, Xiaodong Chen<sup>2</sup>, Jing Liao<sup>1</sup>*.
<sup>1</sup> City University of Hong Kong, Hong Kong SAR. <sup>2</sup> Tianjin University, China.
[Paper Link] . Supplementary materials can be found in Arxiv version.
Abstract
Recently, text-to-image denoising diffusion probabilistic models (DDPMs) have demonstrated impressive image generation capabilities and have also been successfully applied to image inpainting. However, in practice, users often require more control over the inpainting process beyond textual guidance, especially when they want to composite objects with customized appearance, color, shape, and layout. Unfortunately, existing diffusion-based inpainting methods are limited to single-modal guidance and require task-specific training, hindering their cross-modal scalability. To address these limitations, we propose Uni-paint, a unified framework for multimodal inpainting that offers various modes of guidance, including unconditional, text-driven, stroke-driven, exemplar-driven inpainting, as well as a combination of these modes. Furthermore, our Uni-paint is based on pretrained Stable Diffusion and does not require task-specific training on specific datasets, enabling few-shot generalizability to customized images. We have conducted extensive qualitative and quantitative evaluations that show our approach achieves comparable results to existing single-modal methods while offering multimodal inpainting capabilities not available in other methods.
Setup
Conda enviromnet
conda env create -f environment.yaml
conda activate ldm
Model
Download pretrained Stable Diffusion v1.4 from here and place it at ckpt/sd-v1-4-full-ema.ckpt
.
Please refer to official SD repo for more details.
CLIP text embedding
Download pre-computed CLIP text embedding (see paper Eq.6 for explanation) from onedrive and place it at ckpt/clip_emb_normalized(49407x768).pth
. Or you can skip downloading now, the code also will generate this file if it's not found, this process may take several mintues.
Usage
Python notebook
- For unconditional/text-driven/stroke-driven inpainting, see
inpaint.ipynb
. - For exemplar-driven inpainting, see
inpaint_with_exemplar.ipynb
.
Gradio demo
We also made an interactive gradio demo for convenient use. Here are the step-by-step guidelines:
-
Launch the demo script
gradio_demo/demo.py
. -
By default, go to http://127.0.0.1:7860/ in your browser, the demo should be displayed there. If you are runing the model on a server, you may forward the demo to your local pc browser by using the command
ssh username@xxx.xxx.xxx.xxx -p 22 -L 7860:localhost:7860
. -
Input image: at the left-top section, provide the input image and draw the mask area.
-
[Optional] Exemplar image: In second column, provide an exemplar image and check the box
Enable exemplar
. -
Initialize: Click
Initialize
button (this will setup the model and prepare your inputs). -
Finetune: Click
Finetune
button to launch the finetuning on your inputs. Please wait until finetuning is finished (which takes ~1 minute, you will see button changes fromFinetuning...
back toFinetune
when it's done). -
Inference:
-
Unconditional inpainting: make sure to uncheck all the boxes in the top row (i.e.,
Enable text
,Enable exemplar
,Enable stroke
), then clickInference
button. -
Text inpainting: In Text condition section (3rd column), first check the box
Enable text
, and input your text prompt, then clickInference
button. -
Exemplar inpainting: In Exemplar condition section (2nd column), check the box
Enable exemplar
, then clickInference
button. -
Stroke inpainting: In Stroke condition section (last column), first check the box
Enable stroke
, then you will see the masked input being displayed below, use the color brush tool to draw the color stroke within the black masked area. Or you can upload your own stroke image (the background needs to be black). Finally clickInference
button. -
Mixed inpainting: for example, to perform text + stroke inpainting, check both
Enable text
andEnable stroke
boxe, uncheckEnable exemplar
box, input your text prompt and draw the color stroke, then clickInference
button.Note: you can adjust the stroke blending timestep slide bar to adjust the realism-faithfulness trade-off (larger value leads to more realistic but less aligned result).
-
-
Outputs: The generated results will be shown at bottom row.
Other notes:
- If you change the input image and/or exemplar image, you need to redo the Initialization and Finetuning process (repeat step 3-7).
- If you want to change text and/or stroke image, you do NOT need to repeat Initialization and Finetuning (just re-do step 7).
- Sometimes the stroke image may not be fully or successfully displayed after clicking
Enable stroke
box, this might be caused by the unknown bug of the gradio, check and uncheck theEnable stroke
box several times can solve this issue.
Citation
@inproceedings{unipaint,
author = {Yang, Shiyuan and Chen, Xiaodong and Liao, Jing},
title = {Uni-Paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model},
year = {2023},
isbn = {9798400701085},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi = {10.1145/3581783.3612200},
booktitle = {Proceedings of the 31st ACM International Conference on Multimedia},
pages = {3190–3199},
location = {Ottawa ON, Canada},
series = {MM '23}
}
Acknowledgment
The code is built based on LDM and Textual Inversion.