Home

Awesome

VICTR: Visual Information Captured Text Representation for Text-to-Image Generation Tasks

This repository contains code for paper VICTR: Visual Information Captured Text Representation for Text-to-Image Generation Tasks

<h4 align="center"> <b>Han, C.*, Long, S.*, Luo, S., Wang, K., & Poon, J. (2020, December). <br/><a href="https://www.aclweb.org/anthology/2020.coling-main.277.pdf">VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks</a><br/>In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), pp. 3107-3117</b></span> </h4>

image

1. Introduction

The proposed VICTR representation for text-to-image multimodal tasks contains two major types of embedding: (1) Basic Graph embedding (for object, relation, attribute) and (2) Positional Graph embedding (for object, relation), which captures rich visual semantic information of objects from the text description. This repository provides the the integration of proposed VICTR representation based on the three original text-to-image generation models: stackGAN, attnGAN and DM-GAN.

2. Main code structure and running requirement

Root ---> repository

code ---> the main code for the three models

stackgan_victr ---> main code for stackGAN+VICTR

attngan_victr ---> main code for attnGAN+VICTR

dmgan_victr ---> main code for DM-GAN+VICTR

DAMSMencoders ---> pretrained DAMSM text/image encoder from attnGAN

data

coco ---> COCO2014 images and related data files

train ---> train related data files

test ---> test related data files

output ---> model output

Environment for running the code:

3. Setup and data preperation

3.1 Origianl text-to-image related setup

->Preprocessed COCO metadata

COCO provided by attnGAN

->Pretrained DAMSM text/image encoder

DAMSM for COCO provided by attnGAN

3.2 Coco2014 images for training and evaluation

Training: wget http://images.cocodataset.org/zips/train2014.zip

Evaluation: wget http://images.cocodataset.org/zips/val2014.zip

3.3 Preprocessed caption graphs and trained embeddings of VICTR

Processed caption graphs:

Trained graph embeddings: python google_drive.py 1lr7Mcw6R6cr5zYnjYJ_ckmnkR0ARYa3q victr_graph.zip download and unzip to data/coco/

4. Training

Go to the main code directory of the corresponding model and fun the training command:

The saved models will be available in output files. The training epoch and saving interval can be changed by specifying the value for MAX_EPOCH and TRAIN.SNAPSHOT_INTERVAL in the corresponding training yml files:

5. Evaluation

  1. Replace the path to saved models to the TRAIN.NET_G and TRAIN.SG_ATTN in the evaluation yml files (e.g. NET_G: '../models/netG_epoch_128.pth' and SG.SG_ATTN: '../models/attnsg_epoch_128.pth'), and make sure the B_VALIDATION is set to True which will use the coco2014 eval set for generation:
  1. Run the following command:

python main.py --cfg cfg/eval_coco.yml --gpu 0 --use_sg

  1. Evaluation metrics By running the evaluation code, the generated images can be found in the folder under the model path. To evalute the generated images, the R-precision will be calculated automatically during the evaluation (Using the evaluation code from DM-GAN). For the IS and FID, we also use directly the evaluation script from DM-GAN.

References: