Home

Awesome

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

This is the official repository of Frido. We now support training and testing for text-to-image, layout-to-image, scene-graph-to-image, and label-to-image on COCO/VG/OpenImage. Please stay tune there!

Frido demo

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis <br/>Wan-Cyuan Fan, Yen-Chun Chen, DongDong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang<br/>


☀️Important updates


☀️News

We provide a web version of demo here to help researchers to better understand our work. This web demo contains multiple animations to explain th diffusion and denoising processes of Frido and more qualitative experimental results. Hope it's useful!


🐧TODO

Frido codebase


Machine environment


Requirements

A conda environment named frido can be created and activated with:

conda env create -f environment.yaml
conda activate frido

Datasets setup

We provide two approaches to set up the datasets:

🎶 Auto-download

To automatically download datasets and save it into the default path (../), please use following script:

bash tools/datasets/download_coco.sh
bash tools/datasets/download_vg.sh
bash tools/datasets/download_openimage.sh

🎶 Manual setup

COCO 2014 split (T2I)

COCO-stuff 2017

Standard split (Layout2I & Label2I)
Segmentation challenge split (Layout2I & SG2I)

Visual Genome (Layout2I & SG2I)

python3 TODO.py [VG_DIR_PATH]

OpenImage (Layout2I)

File structure for dataset and code

Please make sure that the file structure is the same as the following. Or, you might modify the config file to match the corresponding paths.

<details><summary>File structure</summary>
>datasets
├── coco
│   └── 2014
│        └── annotations
│        └── val2014
│        └── ...
│   └── 2017
│        └── annotations
│        └── val2017
│        └── ...
├── vg
├── openimage
>Frido
└── configs
│   └── frido
│   └── ... 
└── exp
│   └── t2i
│        └── frido_f16f8_coco
│             └── checkpoints
│                  └── model.ckpt
│   └── layout2i
│   └── ...
└── frido
└── scripts
└── tools
└── ...
</details>

Download pre-trained models

Microsoft’s blob storage no longer allow anonymous downloads per the latest company wise security policy. The pretrained-weights may not be available by the following script. Please kindly download the checkpoints from Google Drive.

The following table describs tasks and models that are currently available. To auto-download (using azcopy) all model checkpoints of Frido, please use following command:

bash tools/download.sh

You may also download them manually from the download links shown below.

TaskDatasetFIDLink (TODO)Comments
Text-to-imageCOCO 201411.24Google drive
Text-to-image (mini)COCO 201464.85Google drive1000 images of mini-val; FID was calculated against corresponding GT images.
Text-to-imageCOCO 201410.74Google driveCLIP encoder from stable diffusion (not CLIP re-ranking)
Scene-graph-to-imageCOCO-stuff 201746.11Google driveData preprocessing same as sg2im.
Scene-graph-to-imageVisual Genome31.61Google driveData preprocessing same as sg2im.
Label-to-imageCOCO-stuff27.65Google drive2-30 instances
Label-to-imageCOCO-stuff47.39Google drive3-8 instances
Layout-to-imageCOCO (finetuned from OpenImage)37.14Google driveFID calculated on 2,048 val images.
Layout-to-image (mini)COCO (finetuned from OpenImage)121.23Google drive320 images of mini-val; FID was calculated against corresponding GT images.
Layout-to-imageOpenImage29.04Google driveFID calculated on 2,048 val images.
Layout-to-imageVisual Genome17.24Google driveDDIM 250 steps. Wegiths initialized from coco-f8f4.

The mini-versions are for quick testing and reproducing, which can be done within 1 hours on 1V100. High FID is expected. To evaluate generation quality, full validation / test split needs to be run.*

FID scores were evaluated by using torch-fidelity. The scores may slightly fluctuate due to the inherent initial random noise of diffusion models.


🌲Inference Frido

We now provide scripts for testing Frido.

Quick Start

Please checkout the jupyter notebook demo.ipynb for a simple demo on text-to-image generation for COCO.

Once the datasets and model weights are properly set up, one may test Frido by the following commands.

Text-to-image

# for full validation:
bash tools/frido/eval_t2i.sh

# for mini-val:
bash tools/frido/eval_t2i_minival.sh

Layout-to-image

# for full validation:
bash tools/frido/eval_layout2i.sh

# for mini-val:
bash tools/frido/eval_layout2i_minival.sh

Default output folder will be exp/layout2i/frido_f8f4/samples

(Optional) You can modify the script by adding following augments.

Multi-GPU testing

We provide code for multiple GPUs testing. Please refer to scripts of tools/eval_t2i_multiGPU.sh

For example, 4-gpu inference can be run by the following.

bash eval_t2i_multiGPU.sh 4

🌱Train Frido

We provide some sample scripts for training Frido.

Once the datasets and model weights are properly set up, one may test Frido by the following commands.

MS-VQGAN

bash tools/msvqgan/train_msvqgan_f16f8_coco.sh

Frido

bash tools/frido/train_t2i_f16f8_coco.sh

(Optional) You can modify the script by adding following augments. bold denotes default settings.

Multi-GPU training

For multi-GPU training, please modify the augmentation of --gpus in the training scripts as follows.

For single GPU training,

python main.py --base [CONFIGS] -t True --gpus 1 -log_dir [LOG_DIR] -n [EXP_NAME]

For 8 GPUs training,

python main.py --base [CONFIGS] -t True --gpus 0,1,2,3,4,5,6,7 -log_dir [LOG_DIR] -n [EXP_NAME]

Evaluation

FID & SceneFID

FID scores were evaluated by using torch-fidelity.

After running inference, FID score can be computed by the following command:

fidelity --gpu 0 --fid --input2 [GT_FOLDER] --input1 [PRED_FOLDER]

Example:

fidelity --gpu 0 --fid --input2 exp/t2i/frido_f16f8/samples/.../img/inputs --input1 exp/t2i/frido_f16f8/samples/.../img/sample

CLIPscore

Please refer to EMNLP 2021 CLIPScore.

Detection score (YOLO)

We use YOLOv4 as pre-trained detector to calculate the detection score. Please refer to YOLOv4

IS/Precision/Recall

We use the scripts in ADM to calculate the IS, precision, and recall.

PSNR/SSIM

To evaluate the reconstruction performance, we use the PSNR and SSIM. The scripts can be found in the following python packages.

Acknowledgement

We build Frido codebase heavily on the codebase of Latent Diffusion Model (LDM) and VQGAN. We sincerely thank the authors for open-sourcing!

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{fan2022frido,
  title={Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis},
  author={Fan, Wan-Cyuan and Chen, Yen-Chun and Chen, Dongdong and Cheng, Yu and Yuan, Lu and Wang, Yu-Chiang Frank},
  booktitle={AAAI},
  year={2023}
}

License

MIT