Home

Awesome

img2img-turbo

Paper | Sketch2Image Demo

Quick start: Running Locally | Gradio (locally hosted) | Training

Cat Sketching

<p align="left" > <img src="https://raw.githubusercontent.com/GaParmar/img2img-turbo/main/assets/cat_2x.gif" width="800" /> </p>

Fish Sketching

<p align="left"> <img src="https://raw.githubusercontent.com/GaParmar/img2img-turbo/main/assets/fish_2x.gif" width="800" /> </p>

We propose a general method for adapting a single-step diffusion model, such as SD-Turbo, to new tasks and domains through adversarial learning. This enables us to leverage the internal knowledge of pre-trained diffusion models while achieving efficient inference (e.g., for 512x512 images, 0.29 seconds on A6000 and 0.11 seconds on A100).

Our one-step conditional models CycleGAN-Turbo and pix2pix-turbo can perform various image-to-image translation tasks for both unpaired and paired settings. CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods, while pix2pix-turbo is on par with recent works such as ControlNet for Sketch2Photo and Edge2Image, but with one-step inference.

One-Step Image Translation with Text-to-Image Models<br> Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu<br> CMU and Adobe, arXiv 2403.12036

<br> <div> <p align="center"> <img src='assets/teaser_results.jpg' align="center" width=1000px> </p> </div>

Results

Paired Translation with pix2pix-turbo

Edge to Image

<div> <p align="center"> <img src='assets/edge_to_image_results.jpg' align="center" width=800px> </p> </div> <!-- **Sketch to Image** TODO -->

Generating Diverse Outputs

By varying the input noise map, our method can generate diverse outputs from the same input conditioning. The output style can be controlled by changing the text prompt.

<div> <p align="center"> <img src='assets/gen_variations.jpg' align="center" width=800px> </p> </div>

Unpaired Translation with CycleGAN-Turbo

Day to Night

<div> <p align="center"> <img src='assets/day2night_results.jpg' align="center" width=800px> </p> </div>

Night to Day

<div><p align="center"> <img src='assets/night2day_results.jpg' align="center" width=800px> </p> </div>

Clear to Rainy

<div> <p align="center"> <img src='assets/clear2rainy_results.jpg' align="center" width=800px> </p> </div>

Rainy to Clear

<div> <p align="center"> <img src='assets/rainy2clear.jpg' align="center" width=800px> </p> </div> <hr>

Method

Our Generator Architecture: We tightly integrate three separate modules in the original latent diffusion models into a single end-to-end network with small trainable weights. This architecture allows us to translate the input image x to the output y, while retaining the input scene structure. We use LoRA adapters in each module, introduce skip connections and Zero-Convs between input and output, and retrain the first layer of the U-Net. Blue boxes indicate trainable layers. Semi-transparent layers are frozen. The same generator can be used for various GAN objectives.

<div> <p align="center"> <img src='assets/method.jpg' align="center" width=900px> </p> </div>

Getting Started

Environment Setup

Paired Image Translation (pix2pix-turbo)

Unpaired Image Translation (CycleGAN-Turbo)

Gradio Demo

Training with your own data

Acknowledgment

Our work uses the Stable Diffusion-Turbo as the base model with the following LICENSE.