Awesome

VISII - Visual Instruction Inversion 👀

arXiv | BibTeX | Project Page


Visii learn instruction from before → after image, then apply to new images to perform same edit.

👀 Visual Instruction Inversion: Image Editing via Image Prompting (NeurIPS 2023) Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee 🦡 University of Wisconsin-Madison

TL;DR: A framework for inverting visual prompts into editing instructions for text-to-image diffusion models.

ELI5 👧: You show the machine how to perform a task (by images), and then it replicates your actions. For example, it can learn your drawing style 🖍️ and use it to create a new drawing 🎨.

Requirements

This script is tested on NVIDIA RTX 3090, Python 3.7 and PyTorch 1.13.0 and diffusers.

pip install -r requirements.txt

Quickstart

Visual Instruction Inversion with InstructPix2Pix.

# optimize <ins> (default checkpoint)
python train.py --image_folder ./images --subfolder painting1
# test <ins>
python test.py
# hybrid instruction: <ins> + "a squirrel" (default checkpoint)
python test.py --hybrid_ins True --prompt "a husky" --guidance_scale 10

Result image will be saved in ./result folder.

<table> <tr> <td colspan=1>Before: <img src="./images/0_0.png" alt="before" width='200'/></td>  <td>After: <img src="./images/painting1/0_1.png" alt="after" width='200'/></td> <td colspan=1>Test: <img src="./images/1_0.png" alt="test" width='200'/></td> </tr> </table>

Visii learns editing instruction from dog → watercolor dog image, then applies it into new image to perform same edit. You can also concatenate new information to achieve new effects: dog → watercolor husky.

<table> <tr> <td></td> <td>Different photos are generated from different noises.</td> </tr> <tr> <td style='text-align:right;'> <ins> </td> <td colspan=1><img src="./assets/images/ins.png"></td> </tr> <tr> <td><ins> + "a husky" 🐶</td> <td><img src="./assets/images/husky.png"></td> </tr> <tr> <td><ins> + "sa quirrel" 🐿️</td> <td><img src="./assets/images/squirrel.png"></td> </tr> <tr> <td><ins> + "a tiger" 🐯</td> <td><img src="./assets/images/tiger.png"></td> </tr> <tr> <td><ins> + "a rabbit" 🐰</td> <td><img src="./assets/images/rabbit.png"></td> </tr> <tr> <td><ins> + "a blue jay" 🐦</td> <td><img src="./assets/images/bluejay.png"></td> </tr> <tr> <td><ins> + "a polar bear" 🐻‍❄️</td> <td><img src="./assets/images/polar.png"></td> </tr> <tr> <td><ins> + "a badger" 🦡</td> <td><img src="./assets/images/badger.png"></td> </tr> <tr> <td colspan=2 style='text-align:right;'> on & on ...</td> </tr> </table>

⚠️ If you're not getting the quality that you want... You might tune the guidance_scale.

<table> <tr> <td><img src="./assets/images/guidance_scale.png"></td> </tr> <tr><td><ins> + "a poodle": From left to right: Increase the guidance scale (4, 6, 8, 10, 12, 14)</td></tr> </table>

Starbucks Logo

🧚🧚🧚 Inspired by this reddit, we tested Visii + InstructPix2Pix with Starbucks and Gandour logos.

<table> <tr> <td>Before: <img src="./desfassets/images/prior-vs-ours/starbuck_0_0.png" alt="before" width='200'/></td>  <td>After: <img src="./assets/images/prior-vs-ours/starbuck_0_1.png" alt="after" width='200'/></td> <td colspan=2></td> </tr> <tr> <td> Test: <img src="./assets/images/prior-vs-ours/starbucks_1_0.png" alt="test" width='200'/></td> <td> <ins> + "Wonder Woman" <img src="./assets/images/prior-vs-ours/starbucks_wonder_woman.png" alt="ours" width='200'/></td> <td> <ins> + "Scarlet Witch" <img src="./assets/images/prior-vs-ours/starbucks_scarlet_witch.png" alt="ours" width='200'/></td> <td> <ins> + "Daenerys Targaryen" <img src="./assets/images/prior-vs-ours/starbucks_dragon.png" alt="ours" width='200'/></td> </tr> <tr> <td></td> <td> <ins> + "Neytiri in Avatar" <img src="./assets/images/prior-vs-ours/starbucks_avatar.png" alt="ours" width='200'/></td> <td> <ins> + "She-Hulk" <img src="./assets/images/prior-vs-ours/starbucks_shehulk.png" alt="ours" width='200'/></td> <td> <ins> + "Maleficent" <img src="./assets/images/prior-vs-ours/starbucks_maleficent.png" alt="ours" width='200'/></td> </tr> </table>

(If you're still not getting the quality that you want... You might tune the InstructPix2Pix parameters. See Tips or Optimizing progress ⚠️ for more details.)

Visual Instruction Inversion

1. Prepare before-after images: A basic structure for image-folder should look like below. {image_name}_{0}.png denotes before image, {image_name}_{1}.png denotes after image.

By default, we use 0_0.png as the before image and 0_1.png as the after image. 1_0.png is the test image.

{image_folder}
└───{subfolder}
    │   0_0.png # before image
    │   0_1.png # after image
    │   1_0.png # test image

Check ./images/painting1 for example folder structure.

2. Instruction Optimization: Check the ./configs/ip2p_config.yaml for more details of hyper-parameters and settings.

Visii + InstructPix2Pix

# optimize <ins> (default checkpoint)
python train.py --image_folder ./images --subfolder painting1
# test <ins>
python test.py --log_folder ip2p_painting1_0_0.png
# hybrid instruction: <ins> + "a squirrel" (default checkpoint)
python test_concat.py --prompt "a husky"

Visii + ControlNet!

We plugged Visii with ControlNet 1.1 InstructPix2Pix.

# optimize <ins> (default checkpoint)
python train_controlnet.py --image_folder ./images --subfolder painting1
# test <ins>
python test_controlnet.py --log_folder controlnet_painting1_0_0.png

Optimizing Progress

By default, we use the lowest MSE checkpoint (./logs/{foldername}/best.pth) as the final instruction.

Sometimes, the best.pth checkpoint might not yield the best result.

If you want to use a different checkpoint, you can specify it using the --checkpoint_number argument.

A visualization of the optimization progress is saved in ./logs/{foldername}/eval_100.png ⚠️. You can visually select the best checkpoint for testing.

# test <ins> (with specified checkpoint)
python test.py --log_folder ip2p_painting1_0_0.png --checkpoint_number 800
# hybrid instruction: <ins> + "a squirrel" (with specified checkpoint)
python test_concat.py --prompt "a husky" --checkpoint_number 800

<table> <tr> <td><img src='./assets/images/optim_progress.png'/></td> </tr> <tr> <td>From left to right: [Before, After, Iter 0, Iter 100, ..., Iter 900]. You can visually select the best checkpoint for testing.</td> </tr> </table>

Side note: Before-after image should be algined for better results.

Acknowledgement

Ours code is based on InstructPix2Pix, Hard Prompts Made Easy, Imagic, and Textual Inversion. You might also check awesome Visual Prompting via Image Inpainting. Thank you! 🙇‍♀️

Photo credit: Bo the Shiba & Mam the Cat 🐕🐈.

BibTeX

@inproceedings{
nguyen2023visual,
title={Visual Instruction Inversion: Image Editing via Image Prompting},
author={Thao Nguyen and Yuheng Li and Utkarsh Ojha and Yong Jae Lee},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=l9BsCh8ikK}
}