Home

Awesome

ITI-GEN: Inclusive Text-to-Image Generation

paper arXiv video poster

Overview

<p align="center"> <img src="docs/teaser.png" width="600px"/> </p>

ITI-GEN: Inclusive Text-to-Image Generation<br> Cheng Zhang, Xuanbai Chen, Siqi Chai, Chen Henry Wu, Dmitry Lagun, Thabo Beeler, Fernando De la Torre <br> ICCV 2023 Oral, Best Paper Finalist

This repo contains the code for training ITI-GEN and generating images that uniformly span across the categories of selected attributes. The main idea behind our approach is leveraging reference images to better represent diverse attributes. Key features of our method are:

Updates

[Oct 28 2023] Evaluation code added here.

[Sep 18 2023] Code released. Generation using Stable Diffusion is supported. Will support ControlNet, InstructionPix2Pix later.

[Sep 11 2023] Paper released to Arxiv.

Outline

Installation

The code has been tested with the following environment:

git clone https://github.com/humansensinglab/ITI-GEN.git
cd ITI-GEN
conda env create --name iti-gen --file=environment.yml
source activate iti-gen

Data Preparation

<p align="center"> <img src="docs/fig_sample.png" width="600px"/> </p>
  1. We provide processed reference images as follows:
DatasetDescriptionAttribute UsedGoogle Drive
CelebAReal face images40 binary facial attributesLink
FairFaceReal face imagesAge with 9 categoriesLink
FAIRSynthetic face imagesSkin tone with 6 categoriesLink
LHQNatural scenes11 global scene attributesLink

Save the .zip files and unzip the downloaded reference images under data/ directory:

|-- data
|   |-- celeba
|   |   |-- 5_o_Clock_Shadow
|   |   |-- Bald
|   |   |-- ...

|   |-- FAIR_benchmark
|   |   |-- Skin_tone

|   |-- fairface
|   |   |-- Age

|   |-- lhq
|   |   |-- Bright
|   |   |-- Colorful
|   |   |-- ...
  1. (Optional) You can also construct customized reference images under data/ directory:
|-- data
|   |-- custom_dataset_name
|   |   |-- Attribute_1
|   |   |   |-- Category_1
|   |   |   |-- Category_2
|   |   |   |-- ..
|   |   |-- Attribute_2
|   |   |-- ...

Modify the corresponding functions in util.py.

Training ITI-GEN

<p align="center"> <img src="docs/loss.png" width="400px"/> </p>

1. Train on human domain (only several minutes)

python train_iti_gen.py \
    --prompt='a headshot of a person' \
    --attr-list='Male,Skin_tone,Age' \
    --epochs=30 \
    --save-ckpt-per-epochs=10

2. Train on scene domain (only several minutes)

python train_iti_gen.py \
    --prompt='a natural scene' \
    --attr-list='Colorful' \
    --epochs=30 \
    --save-ckpt-per-epochs=10

(Optional) Prompt Prepending

<p align="center"> <img src="docs/fig_framework.png"/> </p>

1. Prepend on human domain

python prepend.py \
    --prompt='a headshot of a person' \
    --attr-list='Male,Skin_tone,Age' \
    --load-model-epoch=19 \
    --prepended-prompt='a headshot of a doctor'

2. Prepend on scene domain

python prepend.py \
    --prompt='a natural scene' \
    --attr-list='Colorful' \
    --load-model-epoch=19 \
    --prepended-prompt='an alien pyramid landscape, art station, landscape, concept art, illustration, highly detailed artwork cinematic'

Generation

Our ITI-GEN training is standalone from the generative models such as Stable Diffusion, ControlNet, and InstructionPix2Pix. Here we show one example how to use ITI-GEN to generate images with Stable Diffusion.

Stable Diffusion installation

cd models
git clone https://github.com/CompVis/stable-diffusion.git
# ITI-GEN has been tested with this version: https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
# Due to licence issues, we cannot share the pre-trained checkpoints directly.
# Download it yourself and put the Stable Diffusion checkpoints at <path/to/sd-v1-4.ckpt>.
mv stable-diffusion sd
mkdir -p sd/models/ldm/stable-diffusion-v1/
ln -s <path/to/sd-v1-4.ckpt> sd/models/ldm/stable-diffusion-v1/model.ckpt
cd sd
pip install -e .
cd ../..

Image generation

1. Generation on the human domain

<p align="center"> <img src="docs/multi_category.png" style="margin-right: 10px;" width="370px"> <img src="docs/multi_category_man.png" width="370px"> </p>
python generation.py \
    --config='models/sd/configs/stable-diffusion/v1-inference.yaml' \
    --ckpt='models/sd/models/ldm/stable-diffusion-v1/model.ckpt' \
    --plms \
    --attr-list='Male,Skin_tone,Age' \
    --outdir='./ckpts/a_headshot_of_a_person_Male_Skin_tone_Age/original_prompt_embedding/sample_results' \
    --prompt-path='./ckpts/a_headshot_of_a_person_Male_Skin_tone_Age/original_prompt_embedding/basis_final_embed_19.pt' \
    --n_iter=5 \
    --n_rows=5 \
    --n_samples=1

2. Generation on the scene domain

<p align="center"> <img src="docs/scene_3.png" style="margin-right: 10px;" width="370px"> <img src="docs/scene_4.png" width="370px"> </p>
python generation.py \
    --config='models/sd/configs/stable-diffusion/v1-inference.yaml' \
    --ckpt='models/sd/models/ldm/stable-diffusion-v1/model.ckpt' \
    --plms \
    --attr-list='Colorful' \
    --outdir='./ckpts/a_natural_scene_Colorful/original_prompt_embedding/sample_results' \
    --prompt-path='./ckpts/a_natural_scene_Colorful/original_prompt_embedding/basis_final_embed_19.pt' \
    --n_iter=5 \
    --n_rows=5 \
    --n_samples=1

We are actively adding more features to this repo. Please stay tuned!

Evaluation

We show using CLIP, which is found superior to the pre-trained classifiers, for evaluating most of the attributes. When it might be erroneous for some attributes, we combine the CLIP results with human evaluations. The output for this script contains the quantitative results of both KL divergence and FID score, supported by CleanFID.

python evaluation.py \
    --img-folder '/path/to/image/folder/you/want/to/evaluate' \
    --class-list 'a headshot of a person wearing eyeglasses' 'a headshot of a person'

We should notice FID score can be affected by various factors such as the image number. Each FID score in our paper is computed using images over 5K. For sanity check, we suggest directly comparing with the FID score of the images from baseline Stable Diffusion in the same setup. Please refer to Section 4.1 Quantitative Metrics in the main paper and Section D in the supplementary materials for more details.

Acknowledgements

Citation

If you find this repo useful, please cite:

@inproceedings{zhang2023inclusive,
  title={{ITI-GEN}: Inclusive Text-to-Image Generation},
  author={Zhang, Cheng and Chen, Xuanbai and Chai, Siqi and Wu, Henry Chen and Lagun, Dmitry and Beeler, Thabo and De la Torre, Fernando},
  booktitle = {ICCV},
  year={2023}
}

License

We use the X11 License. This license is identical to the MIT License, but with an extra sentence that prohibits using the copyright holders' names (Carnegie Mellon University and Google in our case) for advertising or promotional purposes without written permission.