Awesome

Image Synthesis from Layout with Locality-Aware Mask Adaptation

This repository contains a PyTorch implementation of our ICCV 2021 paper Image Synthesis from Layout with Locality-Aware Mask Adaptation.

Abstract

This paper is concerned with synthesizing images conditioned on a layout (a set of bounding boxes with object categories). Existing works construct a layout-mask image pipeline. Object masks are generated separately and mapped to bounding boxes to form a whole semantic segmentation mask (layout-to-mask), with which a new image is generated (mask-to-image). However, overlapped boxes in layouts result in overlapped object masks, which reduces the mask clarity and causes confusion in image generation.

We hypothesize the importance of generating clean and semantically clear semantic masks. The hypothesis is supported by the finding that the performance of state-of-the-art LostGAN decreases when input masks are tainted. Motivated by this hypothesis, we propose Locality-Aware Mask Adaption (LAMA) module to adapt overlapped or nearby object masks in the generation. Experimental results show our proposed model with LAMA outperforms existing approaches regarding visual fidelity and alignment with input layouts. On COCO-stuff in 256×256, our method improves the state-of-the-art FID score from 41.65 to 31.12 and the SceneFID from 22.00 to 18.64.

Main Pipeline

Main Results

Implementation

The environment is tested on Ubuntu 16.04 with CUDA 10.01 and NVIDIA RTX 2080 TI. The code is written in PyTorch 1.6, and the requirements of conda environment are provided in LAMA.yaml, LAMA_tf.yaml and LAMA_YOLO.yaml.

Pretrained Models

We provide pre-trained models of COCO and VG in Google Drive and Weiyun. Please put all pretrained models under pretrained_models/

Installation

Environment

Create an environment in conda

conda env create -f LAMA_tf.yaml
conda env create -f LAMA.yaml
conda activate LAMA
pip install tensorboardX pycocotools

Setup for roi_layers

python setup.py build develop

Data

Download COCO dataset to datasets/coco

bash scripts/download_coco.sh

Download VG dataset to datasets/vg

bash scripts/download_vg.sh
python scripts/preprocess_vg.py

Training

The training process uses PyTorch DataDistributedParallel module.

conda activate LAMA
export CUDA_VISIBLE_DEVICES=0; python -m torch.distributed.launch --nproc_per_node=1 train.py --img_size 128 --batch_size 20 --out_path experiment/coco_128/

With multiple GPUs, the training command can be

export CUDA_VISIBLE_DEVICES=0,1,2,3; python -m torch.distributed.launch --nproc_per_node=4 train.py --img_size 128 --batch_size 20 --out_path experiment/coco_128/

Testing

We provide examples to use the pretrained model and to calculate the evaluation metrics.

Run Pretrained Model

python test.py --dataset coco --model_path pretrained_models/coco_128.pth --sample_path samples/ --gpu 1

Inception Score

conda activate LAMA_tf
python scores/InceptionScore.py samples/coco128_repeat5_thres2.0/ --gpu 0

FID

The validation images are extracted, with which the

conda activate LAMA
python utils/extract_val.py --dataset coco --img_size 128
conda activate LAMA_tf
python scores/FID.py datasets/coco/val_128/ samples/coco128_repeat5_thres2.0/ --gpu 0 --lowprofile

Diversity Score

conda activate LAMA
python test.py --dataset coco --model_path pretrained_models/coco_128.pth --DS -r 2 -N --img_size 128 --gpu 0

SceneFID

We first extract cropped objects from the dataset and generate object crops. Then the SceneFID is computed.

conda activate LAMA
python utils/extract_cropped_objects.py --dataset coco --img_size 128 --cropped_size 224
python test.py --dataset coco --model_path pretrained_models/coco_128.pth --img_size 128 -N --cropped_size 224 --sample_path samples/cropped_224/ --gpu 0
conda activate LAMA_tf
python scores/FID.py datasets/coco/val_128_cropped_224 samples/cropped_224/ --gpu 0 --lowprofile

CAS

We use the implementation of classification from https://github.com/hysts/pytorch_image_classification. The validation accuracy in the last epoch is taken as CAS score.

Setup

conda activate LAMA
git clone https://github.com/hysts/pytorch_image_classification.git
cd pytorch_image_classification
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
pip install thop==0.0.31.post2004070130
pip install fvcore termcolor yacs
cd ../..

Testing

Generate training and testing sets.

conda activate LAMA
python utils/extract_cropped_objects.py --dataset coco --img_size 128 --cropped_size 32
python test.py --dataset coco --model_path pretrained_models/coco_128.pth --img_size 128 -N --cropped_size 32 --sample_path samples/cropped_32/ --gpu 0

Run classification.

cd pytorch_image_classification
mkdir coco_128
cd coco_128
ln -s ../../datasets/coco/val_128_cropped_32/ val
ln -s ../../samples/cropped_32/coco128_repeat5_thres2.0_cropped_32/ train
cd ..
mkdir experiments
sed -i '/macs/d' train.py
sed -i '/n_params/d' train.py
python train.py --config configs/cifar/resnet.yaml dataset.name ImageNet dataset.dataset_dir coco_128/ train.output_dir experiments/coco_128/ dataset.n_classes 184
cd ..

YOLO Scores

Generating Images

conda activate LAMA
python test.py --dataset coco --model_path pretrained_models/coco_128.pth --sample_path samples/ -r 1 --image_id_savepath image_id.txt

Config Environment

YOLO

cd yolo_experiments
conda env create -f LAMA_YOLO.yaml
git clone https://github.com/AlexeyAB/darknet.git

Ground truth

cp ../datasets/coco/annotations/instances_val2017.json data

COCO API

conda activate LAMA_YOLO
cd data
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py install
cd ../../..

The terminal goes back to yolo_experiments/ in the last line.

Testing

In the test.py –-image_path is the path of image and --imageid_path is the order of generated pictures

cd data
conda activate LAMA_YOLO
ln -s ../../datasets/coco/val2017/ val2017
python test.py --imageid_path ../../image_id.txt --image_path ../../samples/coco128_repeat1_thres2.0

Notice we use image_id.txt to specify the validation layout of the generated images. The generated images are named sample_0.jpg, sample_1.jpg, and so on, which is consistent with the order in image_id.txt.

Acknowledgement

This paper is supported by the National Science and Technology Innovation 2030 Major Project (2018AAA0100703) of the Ministry of Science and Technology of China, the National Natural Science Foundation of China (61773336, 62006208), and the Provincial Key Research and Development Plan of Zhejiang Province (2019C03137). Zejian Li would like to thank Pei Chen, Yongxing He in Zhejiang University for helpful comments, and Wei Sun for the kindness to answer questions regarding LostGANs.

Reference

LostGAN: https://github.com/WillSuen/LostGANs/
Image Generation from Scene Graphs: https://github.com/google/sg2im
Faster R-CNN and Mask R-CNN in PyTorch 1.0: https://github.com/facebookresearch/maskrcnn-benchmark
YOLOv4: https://github.com/AlexeyAB/darknet
CAS: https://github.com/hysts/pytorch_image_classification/

Citation

@inproceedings{LAMA,
author = {Zejian Li and Jingyu Wu and Immanuel Koh and Yongchuan Tang and Lingyun Sun},
title = {Image Synthesis from Layout with Locality-Aware Mask Adaption},
year = {2021},
publisher = {IEEE},
pages = {13819--13828}
booktitle = {IEEE International Conference on Computer Vision (ICCV)}
}