Awesome
VISII - Visual Instruction Inversion š
arXiv | BibTeX | Project Page
Visii learn instruction from before ā after image, then apply to new images to perform same edit. |
š Visual Instruction Inversion: Image Editing via Image Prompting (NeurIPS 2023)<br> Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee <br> š¦” University of Wisconsin-Madison
TL;DR: A framework for inverting visual prompts into editing instructions for text-to-image diffusion models.
<div style="align: center; text-align:center;"> <center> <img src="./assets/images/your-drawing-of-your-cat.png" alt="result" height="350"/> <center> </div>ELI5 š§: You show the machine how to perform a task (by images), and then it replicates your actions. For example, it can learn your drawing style šļø and use it to create a new drawing šØ.
š Jump to: Requirements | Quickstart | Visii + Ip2p | Visii + ControlNet | BibTeX | š§ Go Crazy š§
Requirements
This script is tested on NVIDIA RTX 3090
, Python 3.7 and PyTorch 1.13.0 and diffusers.
pip install -r requirements.txt
Quickstart
Visual Instruction Inversion with InstructPix2Pix.
# optimize <ins> (default checkpoint)
python train.py --image_folder ./images --subfolder painting1
# test <ins>
python test.py
# hybrid instruction: <ins> + "a squirrel" (default checkpoint)
python test.py --hybrid_ins True --prompt "a husky" --guidance_scale 10
Result image will be saved in ./result
folder.
Visii learns editing instruction from dog ā watercolor dog image, then applies it into new image to perform same edit. You can also concatenate new information to achieve new effects: dog ā watercolor husky.
<table> <tr> <td></td> <td><i>Different photos are generated from different noises.</i></td> </tr> <tr> <td style='text-align:right;'><font style="color:blue"> <ins> </font></td> <td colspan=1><img src="./assets/images/ins.png"></td> </tr> <tr> <td><ins> + <b>"a husky" š¶</b></td> <td><img src="./assets/images/husky.png"></td> </tr> <tr> <td><ins> + <b>"sa quirrel" šæļø</b></td> <td><img src="./assets/images/squirrel.png"></td> </tr> <tr> <td><ins> + <b>"a tiger" šÆ</b></td> <td><img src="./assets/images/tiger.png"></td> </tr> <tr> <td><ins> + <b>"a rabbit" š°</b></td> <td><img src="./assets/images/rabbit.png"></td> </tr> <tr> <td><ins> + <b>"a blue jay" š¦</b></td> <td><img src="./assets/images/bluejay.png"></td> </tr> <tr> <td><ins> + <b>"a polar bear" š»āāļø</b></td> <td><img src="./assets/images/polar.png"></td> </tr> <tr> <td><ins> + <b>"a badger" š¦”</b></td> <td><img src="./assets/images/badger.png"></td> </tr> <tr> <td colspan=2 style='text-align:right;'> <i>on & on ...</i></td> </tr> </table><table> <tr> <td><img src="./assets/images/guidance_scale.png"></td> </tr> <tr><td><b><ins> + "a poodle"</b>: From left to right: Increase the guidance scale (4, 6, 8, 10, 12, 14)</td></tr> </table>ā ļø <i>If you're not getting the quality that you want... You might tune the <b>guidance_scale</b></i>.
Starbucks Logo
š§š§š§ Inspired by this reddit, we tested Visii + InstructPix2Pix with Starbucks and Gandour logos.
<table> <tr> <td><b>Before:</b><br><img src="./desfassets/images/prior-vs-ours/starbuck_0_0.png" alt="before" width='200'/></td> <!-- <td>āØ<font style="color:red"> <instruction>š„ </font>āØ</instruction> --> <td><b>After:</b><br><img src="./assets/images/prior-vs-ours/starbuck_0_1.png" alt="after" width='200'/></td> <td colspan=2></td> </tr> <tr> <td><br><b>Test:</b><br><img src="./assets/images/prior-vs-ours/starbucks_1_0.png" alt="test" width='200'/></td> <td><b><font style="color:blue"> <ins> <br></font></b>+ "Wonder Woman"<br><img src="./assets/images/prior-vs-ours/starbucks_wonder_woman.png" alt="ours" width='200'/></td> <td><b><font style="color:blue"> <ins> <br></font></b>+ "Scarlet Witch"<br><img src="./assets/images/prior-vs-ours/starbucks_scarlet_witch.png" alt="ours" width='200'/></td> <td><b><font style="color:blue"> <ins> <br></font></b>+ "Daenerys Targaryen"<br><img src="./assets/images/prior-vs-ours/starbucks_dragon.png" alt="ours" width='200'/></td> </tr> <tr> <td></td> <td><b><font style="color:blue"> <ins> <br></font></b>+ "Neytiri in Avatar"</font><br><img src="./assets/images/prior-vs-ours/starbucks_avatar.png" alt="ours" width='200'/></td> <td><b><font style="color:blue"> <ins> <br></font></b>+ "She-Hulk"<br><img src="./assets/images/prior-vs-ours/starbucks_shehulk.png" alt="ours" width='200'/></td> <td><b><font style="color:blue"> <ins> <br></font></b>+ "Maleficent"<br><img src="./assets/images/prior-vs-ours/starbucks_maleficent.png" alt="ours" width='200'/></td> </tr> </table>(If you're still not getting the quality that you want... You might tune the InstructPix2Pix parameters. See Tips or Optimizing progress ā ļø for more details.)
Visual Instruction Inversion
1. Prepare before-after images: A basic structure for image-folder should look like below.
{image_name}_{0}.png
denotes before image, {image_name}_{1}.png
denotes after image.
By default, we use 0_0.png
as the before image and 0_1.png
as the after image. 1_0.png
is the test image.
{image_folder}
āāāā{subfolder}
ā 0_0.png # before image
ā 0_1.png # after image
ā 1_0.png # test image
Check ./images/painting1
for example folder structure.
2. Instruction Optimization: Check the ./configs/ip2p_config.yaml
for more details of hyper-parameters and settings.
Visii + InstructPix2Pix
# optimize <ins> (default checkpoint)
python train.py --image_folder ./images --subfolder painting1
# test <ins>
python test.py --log_folder ip2p_painting1_0_0.png
# hybrid instruction: <ins> + "a squirrel" (default checkpoint)
python test_concat.py --prompt "a husky"
Visii + ControlNet!
We plugged Visii with ControlNet 1.1 InstructPix2Pix.
# optimize <ins> (default checkpoint)
python train_controlnet.py --image_folder ./images --subfolder painting1
# test <ins>
python test_controlnet.py --log_folder controlnet_painting1_0_0.png
Optimizing Progress
By default, we use the lowest MSE checkpoint (./logs/{foldername}/best.pth
) as the final instruction.
Sometimes, the best.pth
checkpoint might not yield the best result.
If you want to use a different checkpoint, you can specify it using the --checkpoint_number
argument.
A visualization of the optimization progress is saved in ./logs/{foldername}/eval_100.png
ā ļø. You can visually select the best checkpoint for testing.
# test <ins> (with specified checkpoint)
python test.py --log_folder ip2p_painting1_0_0.png --checkpoint_number 800
# hybrid instruction: <ins> + "a squirrel" (with specified checkpoint)
python test_concat.py --prompt "a husky" --checkpoint_number 800
<table>
<tr>
<td><img src='./assets/images/optim_progress.png'/></td>
</tr>
<tr>
<td><i>From left to right: [Before, After, <b>Iter 0, Iter 100, ..., Iter 900</b>]. You can visually select the best checkpoint for testing.</i></td>
</tr>
</table>
- Side note: Before-after image should be algined for better results.
Acknowledgement
Ours code is based on InstructPix2Pix, Hard Prompts Made Easy, Imagic, and Textual Inversion. You might also check awesome Visual Prompting via Image Inpainting. Thank you! šāāļø
Photo credit: Bo the Shiba & Mam the Cat šš.
<!-- #### To-do - [x] Visii + Ip2p - [x] Visii + ControlNet - [x] Validate: Visii + Ip2p - [] Validate: Visii + ControlNet - -->BibTeX
@inproceedings{
nguyen2023visual,
title={Visual Instruction Inversion: Image Editing via Image Prompting},
author={Thao Nguyen and Yuheng Li and Utkarsh Ojha and Yong Jae Lee},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=l9BsCh8ikK}
}