Awesome
ReVersion (SIGGRAPH Asia, 2024)
<!-- ![visitors](https://visitor-badge.glitch.me/badge?page_id=ziqihuangg/ReVersion&right_color=MediumAquamarine) -->This repository contains the implementation of the following paper:
ReVersion: Diffusion-Based Relation Inversion from Images<br> Ziqi Huang<sup>β</sup>, Tianxing Wu<sup>β</sup>, Yuming Jiang, Kelvin C.K. Chan, Ziwei Liu<br>
From MMLab@NTU affiliated with S-Lab, Nanyang Technological University
<!-- [[Paper](https://arxiv.org/abs/2303.13495)] | --> <!-- [[Project Page](https://ziqihuangg.github.io/projects/reversion.html)] | --> <!-- [[Video](https://www.youtube.com/watch?v=pkal3yjyyKQ)] | --> <!-- [[Dataset](https://drive.google.com/drive/folders/1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing)] --> <!-- [[Huggingface Demo](https://huggingface.co/spaces/Ziqi/ReVersion)] | -->:open_book: Overview
We propose a new task, Relation Inversion: Given a few exemplar images, where a relation co-exists in every image, we aim to find a relation prompt <R> to capture this interaction, and apply the relation to new entities to synthesize new scenes. The above images are generated by our ReVersion framework.
:heavy_check_mark: Updates
- [03/2024] We optimized the code implementation. You only need to save and load the relation prompt, without having to save or load the entire text-to-image model.
- [08/2023] We released the training code for Relation Inversion.
- [04/2023] We released the ReVersion Benchmark.
- [04/2023] Integrated into Hugging Face π€ using Gradio. Try out the online Demo:
- [03/2023] Arxiv paper available.
- [03/2023] Pre-trained models with relation prompts released at this link.
- [03/2023] Project page and video available.
- [03/2023] Inference code released.
:hammer: Installation
-
Clone Repo
git clone https://github.com/ziqihuangg/ReVersion cd ReVersion
-
Create Conda Environment and Install Dependencies
conda create -n reversion conda activate reversion conda install python=3.8 pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch pip install diffusers["torch"] pip install -r requirements.txt
:page_with_curl: Usage
Relation Inversion
Given a set of exemplar images and their entities' coarse descriptions, you can optimize a relation prompt <R> to capture the co-existing relation in these images, namely Relation Inversion.
-
Prepare the exemplar images (<em>e.g.</em>,
0.jpg
-9.jpg
) and coarse descriptions (text.json
), and put them inside a folder. Feel free to use our ReVersion benchmark, or you can also prepare your own images. An example from our ReVersion benchmark is as follows:.reversion_benchmark_v1 βββ painted_on βΒ Β βββ 0.jpg βΒ Β βββ 1.jpg βΒ Β βββ 2.jpg βΒ Β βββ 3.jpg βΒ Β βββ 4.jpg βΒ Β βββ 5.jpg βΒ Β βββ 6.jpg βΒ Β βββ 7.jpg βΒ Β βββ 8.jpg βΒ Β βββ 9.jpg βΒ Β βββ text.json
-
Take the relation
painted_on
for example, you can start training using this script:accelerate launch \ --config_file="./configs/single_gpu.yml" \ train.py \ --seed="2023" \ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ --train_data_dir="./reversion_benchmark_v1/painted_on" \ --placeholder_token="<R>" \ --initializer_token="and" \ --train_batch_size="2" \ --gradient_accumulation_steps="4" \ --max_train_steps="3000" \ --learning_rate='2.5e-04' --scale_lr \ --lr_scheduler="constant" \ --lr_warmup_steps="0" \ --output_dir="./experiments/painted_on" \ --save_steps="1000" \ --importance_sampling \ --denoise_loss_weight="1.0" \ --steer_loss_weight="0.01" \ --num_positives="4" \ --temperature="0.07" \ --only_save_embeds
Where
train_data_dir
is the path to the exemplar images and coarse descriptions.output_dir
is the path to save the inverted relation and the experiment logs. To generate relation-specific images, you can follow the next section Generation.Note that the
only_save_embeds
option allows you to only save the relation prompt <R>, without having to save the entire Stable Diffusion model. You can decide whether to turn it on.
:framed_picture: Generation
We can use the learned relation prompt <R> to generate relation-specific images with new objects, backgrounds, and style.
-
You can obtain a learned <R> from Relation Inversion using your customized data. You can also download the models from here, where we provide several pre-trained relation prompts for you to play with.
-
Put the models (<em>i.e.</em>, learned relation prompt <R>) under
./experiments/
as follows:./experiments/ βββ painted_on β βββ checkpoint-500 β ... β βββ model_index.json βββ carved_by β βββ checkpoint-500 β ... β βββ model_index.json βββ inside β βββ checkpoint-500 β ... β βββ model_index.json ...
-
Take the relation
painted_on
for example, you can either use the following script to generate images using a single prompt, e.g., "cat <R> stone":python inference.py \ --model_id ./experiments/painted_on \ --prompt "cat <R> stone" \ --placeholder_string "<R>" \ --num_samples 10 \ --guidance_scale 7.5 \ --only_load_embeds
Or write a list prompts in
./templates/templates.py
with the key name$your_template_name
and generate images for every prompt in the list$your_template_name
:your_template_name='painted_on_examples' python inference.py \ --model_id ./experiments/painted_on \ --template_name $your_template_name \ --placeholder_string "<R>" \ --num_samples 10 \ --guidance_scale 7.5 \ --only_load_embeds
Where
model_id
is the model directory,num_samples
is the number of images to generate for each prompt, andguidance_scale
is the classifier-free guidance scale.We provide several example templates for each relation in
./templates/templates.py
, such aspainted_on_examples
,carved_by_examples
, etc.Note that if you saved the entire model during the inversion process, that is, without the
only_save_embeds
flag turned on, then you should turn off theonly_load_embeds
flag during inference. Theonly_load_embeds
option only loads the relation prompt <R> from the experiment folder, and automatically loads the rest of the Stable Diffusion model (including other text token's embeddings) from the default cache location that contains the pre-trained Stable Diffusion model.
:hugs: Gradio Demo
-
We also provide a Gradio Demo to test our method using a UI. This demo supports relation-specific text-to-image generation on the fly. Running the following command will launch the demo:
python app_gradio.py
-
Alternatively, you can try the online demo here.
:art: Diverse Generation
You can also specify diverse prompts with the relation prompt <R> to generate images of diverse backgrounds and style. For example, your prompt could be "michael jackson <R> wall, in the desert"
, "cat <R> stone, on the beach"
, <em>etc</em>.
:straight_ruler: The ReVersion Benchmark
The ReVersion Benchmark consists of diverse relations and entities, along with a set of well-defined text descriptions.
- <b>Relations and Entities</b>. We define ten representative object relations with different abstraction levels, ranging from basic spatial relations (e.g., βon top ofβ), entity interactions (e.g., βshakes hands withβ), to abstract concepts (e.g., βis carved byβ). A wide range of entities, such as animals, human, household items, are involved to further increase the diversity of the benchmark.
- <b>Exemplar Images and Text Descriptions</b>. For each relation, we collect four to ten exemplar images containing different entities. We further annotate several text templates for each exemplar image to describe them with different levels of details. These training templates can be used for the optimization of the relation prompt.
- <b>Benchmark Scenarios</b>. We design 100 inference templates composing of different object entities for each of the ten relations.
:fountain_pen: Citation
If you find our repo useful for your research, please consider citing our paper:
@article{huang2023reversion,
title={{ReVersion}: Diffusion-Based Relation Inversion from Images},
author={Huang, Ziqi and Wu, Tianxing and Jiang, Yuming and Chan, Kelvin C.K. and Liu, Ziwei},
journal={arXiv preprint arXiv:2303.13495},
year={2023}
}
:white_heart: Acknowledgement
The codebase is maintained by Ziqi Huang and Tianxing Wu.
This project is built using the following open source repositories: