Awesome

InstructPix2Pix

UnOfficial Pytorch implementation of 'InstructPix2Pix Learning to Follow Image Editing Instructions', based on https://github.com/JoePenna/Dreambooth-Stable-Diffusion

Inference

python3 stable_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 1 --ddim_steps 50 --ckpt logs/instruct/checkpoints/last.ckpt --W 256 --H 256 --init_img ./samples/tower.jpg --prompt "add fireworks in sky" result

python3 stable_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 1 --ddim_steps 50 --ckpt logs/instruct/checkpoints/last.ckpt --W 512 --H 512 --init_img ./samples/Vermeer_Girl.jpg --prompt "Apply face paint" result

python3 stable_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 1 --ddim_steps 50 --ckpt logs/instruct/checkpoints/last.ckpt --W 512 --H 512 --init_img ./training_images/Vermeer_Girl.jpg --prompt "What if she were in an anime?" result

python3 stable_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 1 --ddim_steps 50 --ckpt logs/instruct/checkpoints/last.ckpt --W 256 --H 256 --init_img ./training_images/dog.jpg --prompt "pig" result

python3 stable_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 1 --ddim_steps 50 --ckpt logs/instruct/checkpoints/last.ckpt --W 256 --H 256 --init_img ./samples/dog.jpg --prompt "dog in Paris" result

python3 stable_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 1 --ddim_steps 50 --ckpt logs/instruct/checkpoints/last.ckpt --W 256 --H 256 --init_img ./samples/sunflowers.jpg --prompt "roses" result

python3 stable_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 1 --ddim_steps 50 --ckpt logs/instruct/checkpoints/last.ckpt --W 256 --H 256 --init_img ./samples/girl.jpg --prompt "She should look 100 years old" --negprompt "deformed" result

python3 stable_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 1 --ddim_steps 50 --ckpt logs/instruct/checkpoints/last.ckpt --W 512 --H 512 --init_img ./samples/girl.jpg --prompt "make hair red" result

Checkpoint

Link: https://drive.google.com/file/d/1vn9qG4kLvXPNJAT-PW7Exwas7MyT7JBu/view?usp=sharing

Implementation deatils

Add additional input channels to the first convolutional layer. All available weights of the diffusion model are initialized from the pretrained checkpoints, and weights that operate on the newly added input channels are initialized to zero. Besides, I add one more GroupNorm32/SiLU/conv_nd layer than original paper.
Set learing rate set 1e-4, batch size 32

Data Preperation

This is tough and money-consuming part...