

ReVersion (SIGGRAPH Asia, 2024)

<!-- ![visitors](https://visitor-badge.glitch.me/badge?page_id=ziqihuangg/ReVersion&right_color=MediumAquamarine) -->

Paper Paper Project Page Video Visitor Hugging Face

This repository contains the implementation of the following paper:

ReVersion: Diffusion-Based Relation Inversion from Images<br> Ziqi Huang<sup>βˆ—</sup>, Tianxing Wu<sup>βˆ—</sup>, Yuming Jiang, Kelvin C.K. Chan, Ziwei Liu<br>

From MMLab@NTU affiliated with S-Lab, Nanyang Technological University

<!-- [[Paper](https://arxiv.org/abs/2303.13495)] | --> <!-- [[Project Page](https://ziqihuangg.github.io/projects/reversion.html)] | --> <!-- [[Video](https://www.youtube.com/watch?v=pkal3yjyyKQ)] | --> <!-- [[Dataset](https://drive.google.com/drive/folders/1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing)] --> <!-- [[Huggingface Demo](https://huggingface.co/spaces/Ziqi/ReVersion)] | -->

:open_book: Overview


We propose a new task, Relation Inversion: Given a few exemplar images, where a relation co-exists in every image, we aim to find a relation prompt <R> to capture this interaction, and apply the relation to new entities to synthesize new scenes. The above images are generated by our ReVersion framework.

:heavy_check_mark: Updates

:hammer: Installation

  1. Clone Repo

    git clone https://github.com/ziqihuangg/ReVersion
    cd ReVersion
  2. Create Conda Environment and Install Dependencies

    conda create -n reversion
    conda activate reversion
    conda install python=3.8 pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch
    pip install diffusers["torch"]
    pip install -r requirements.txt

:page_with_curl: Usage

Relation Inversion

Given a set of exemplar images and their entities' coarse descriptions, you can optimize a relation prompt <R> to capture the co-existing relation in these images, namely Relation Inversion.

  1. Prepare the exemplar images (<em>e.g.</em>, 0.jpg - 9.jpg) and coarse descriptions (text.json), and put them inside a folder. Feel free to use our ReVersion benchmark, or you can also prepare your own images. An example from our ReVersion benchmark is as follows:

    β”œβ”€β”€ painted_on
    β”‚Β Β  β”œβ”€β”€ 0.jpg
    β”‚Β Β  β”œβ”€β”€ 1.jpg
    β”‚Β Β  β”œβ”€β”€ 2.jpg
    β”‚Β Β  β”œβ”€β”€ 3.jpg
    β”‚Β Β  β”œβ”€β”€ 4.jpg
    β”‚Β Β  β”œβ”€β”€ 5.jpg
    β”‚Β Β  β”œβ”€β”€ 6.jpg
    β”‚Β Β  β”œβ”€β”€ 7.jpg
    β”‚Β Β  β”œβ”€β”€ 8.jpg
    β”‚Β Β  β”œβ”€β”€ 9.jpg
    β”‚Β Β  └── text.json
  2. Take the relation painted_on for example, you can start training using this script:

    accelerate launch \
        --config_file="./configs/single_gpu.yml" \
        train.py \
        --seed="2023" \
        --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
        --train_data_dir="./reversion_benchmark_v1/painted_on" \
        --placeholder_token="<R>" \
        --initializer_token="and" \
        --train_batch_size="2" \
        --gradient_accumulation_steps="4" \
        --max_train_steps="3000" \
        --learning_rate='2.5e-04' --scale_lr \
        --lr_scheduler="constant" \
        --lr_warmup_steps="0" \
        --output_dir="./experiments/painted_on" \
        --save_steps="1000" \
        --importance_sampling \
        --denoise_loss_weight="1.0" \
        --steer_loss_weight="0.01" \
        --num_positives="4" \
        --temperature="0.07" \

    Where train_data_dir is the path to the exemplar images and coarse descriptions. output_dir is the path to save the inverted relation and the experiment logs. To generate relation-specific images, you can follow the next section Generation.

    Note that the only_save_embeds option allows you to only save the relation prompt <R>, without having to save the entire Stable Diffusion model. You can decide whether to turn it on.

:framed_picture: Generation

We can use the learned relation prompt <R> to generate relation-specific images with new objects, backgrounds, and style.

  1. You can obtain a learned <R> from Relation Inversion using your customized data. You can also download the models from here, where we provide several pre-trained relation prompts for you to play with.

  2. Put the models (<em>i.e.</em>, learned relation prompt <R>) under ./experiments/ as follows:

    β”œβ”€β”€ painted_on
    β”‚   β”œβ”€β”€ checkpoint-500
    β”‚   ...
    β”‚   └── model_index.json
    β”œβ”€β”€ carved_by
    β”‚   β”œβ”€β”€ checkpoint-500
    β”‚   ...
    β”‚   └── model_index.json
    β”œβ”€β”€ inside
    β”‚   β”œβ”€β”€ checkpoint-500
    β”‚   ...
    β”‚   └── model_index.json
  3. Take the relation painted_on for example, you can either use the following script to generate images using a single prompt, e.g., "cat <R> stone":

    python inference.py \
    --model_id ./experiments/painted_on \
    --prompt "cat <R> stone" \
    --placeholder_string "<R>" \
    --num_samples 10 \
    --guidance_scale 7.5 \

    Or write a list prompts in ./templates/templates.py with the key name $your_template_name and generate images for every prompt in the list $your_template_name:

    python inference.py \
    --model_id ./experiments/painted_on \
    --template_name $your_template_name \
    --placeholder_string "<R>" \
    --num_samples 10 \
    --guidance_scale 7.5 \

    Where model_id is the model directory, num_samples is the number of images to generate for each prompt, and guidance_scale is the classifier-free guidance scale.

    We provide several example templates for each relation in ./templates/templates.py, such as painted_on_examples, carved_by_examples, etc.

    Note that if you saved the entire model during the inversion process, that is, without the only_save_embeds flag turned on, then you should turn off the only_load_embeds flag during inference. The only_load_embeds option only loads the relation prompt <R> from the experiment folder, and automatically loads the rest of the Stable Diffusion model (including other text token's embeddings) from the default cache location that contains the pre-trained Stable Diffusion model.

:hugs: Gradio Demo

:art: Diverse Generation

You can also specify diverse prompts with the relation prompt <R> to generate images of diverse backgrounds and style. For example, your prompt could be "michael jackson <R> wall, in the desert", "cat <R> stone, on the beach", <em>etc</em>.


:straight_ruler: The ReVersion Benchmark

The ReVersion Benchmark consists of diverse relations and entities, along with a set of well-defined text descriptions.

:fountain_pen: Citation

If you find our repo useful for your research, please consider citing our paper:

     title={{ReVersion}: Diffusion-Based Relation Inversion from Images},
     author={Huang, Ziqi and Wu, Tianxing and Jiang, Yuming and Chan, Kelvin C.K. and Liu, Ziwei},
      booktitle={SIGGRAPH Asia 2024 Conference Papers},

:white_heart: Acknowledgement

The codebase is maintained by Ziqi Huang and Tianxing Wu.

This project is built using the following open source repositories: