Awesome
VICTR: Visual Information Captured Text Representation for Text-to-Image Generation Tasks
This repository contains code for paper VICTR: Visual Information Captured Text Representation for Text-to-Image Generation Tasks
<h4 align="center"> <b>Han, C.*, Long, S.*, Luo, S., Wang, K., & Poon, J. (2020, December). <br/><a href="https://www.aclweb.org/anthology/2020.coling-main.277.pdf">VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks</a><br/>In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), pp. 3107-3117</b></span> </h4>1. Introduction
The proposed VICTR representation for text-to-image multimodal tasks contains two major types of embedding: (1) Basic Graph embedding (for object, relation, attribute) and (2) Positional Graph embedding (for object, relation), which captures rich visual semantic information of objects from the text description. This repository provides the the integration of proposed VICTR representation based on the three original text-to-image generation models: stackGAN, attnGAN and DM-GAN.
2. Main code structure and running requirement
Root ---> repository
code ---> the main code for the three models
stackgan_victr ---> main code for stackGAN+VICTR
attngan_victr ---> main code for attnGAN+VICTR
dmgan_victr ---> main code for DM-GAN+VICTR
DAMSMencoders ---> pretrained DAMSM text/image encoder from attnGAN
data
coco ---> COCO2014 images and related data files
train ---> train related data files
test ---> test related data files
output ---> model output
Environment for running the code:
-
python 3.6
-
pytorch 1.4.0 (pip install torch==1.4.0 torchvision==0.5.0)
3. Setup and data preperation
3.1 Origianl text-to-image related setup
->Preprocessed COCO metadata
- download and unzip it to
data/
->Pretrained DAMSM text/image encoder
DAMSM for COCO provided by attnGAN
-
download and unzip it to
DAMSMencoders/
-
for training of the DAMSM model, please refer to attnGAN
3.2 Coco2014 images for training and evaluation
Training: wget http://images.cocodataset.org/zips/train2014.zip
Evaluation: wget http://images.cocodataset.org/zips/val2014.zip
- After downloading, unzip all the images under
data/coco/images/
folder
3.3 Preprocessed caption graphs and trained embeddings of VICTR
Processed caption graphs:
- Training:
python google_drive.py 1LVnM22QKO6hbCzQ173EjOvNBLCJ7JopP victr_sg_train.zip
download and unzip todata/coco/train/
- Evaluation:
python google_drive.py 1KhJezwScr_yd7wfeyczSRjDuf7IYaNDp victr_sg_test.zip
download and unzip todata/coco/test/
Trained graph embeddings: python google_drive.py 1lr7Mcw6R6cr5zYnjYJ_ckmnkR0ARYa3q victr_graph.zip
download and unzip to data/coco/
4. Training
Go to the main code directory of the corresponding model and fun the training command:
-
attnGAN-VICTR:
cd code/attngan_victr
andpython main.py --cfg cfg/coco_attn2.yml --gpu 0 --use_sg
-
DM-GAN-VICTR:
cd code/dmgan_victr
andpython main.py --cfg cfg/coco_DMGAN.yml --gpu 0 --use_sg
The saved models will be available in output files. The training epoch and saving interval can be changed by specifying the value for MAX_EPOCH
and TRAIN.SNAPSHOT_INTERVAL
in the corresponding training yml files:
-
attnGAN-VICTR:
code/attngan_victr/cfg/coco_attn2.yml
-
DM-GAN-VICTR:
code/dmgan_victr/cfg/coco_DMGAN.yml
5. Evaluation
- Replace the path to saved models to the
TRAIN.NET_G
andTRAIN.SG_ATTN
in the evaluation yml files (e.g.NET_G: '../models/netG_epoch_128.pth'
andSG.SG_ATTN: '../models/attnsg_epoch_128.pth'
), and make sure theB_VALIDATION
is set toTrue
which will use the coco2014 eval set for generation:
-
attnGAN-VICTR:
code/attngan_victr/cfg/eval_coco.yml
-
DM-GAN-VICTR:
code/dmgan_victr/cfg/eval_coco.yml
- Run the following command:
python main.py --cfg cfg/eval_coco.yml --gpu 0 --use_sg
- Evaluation metrics By running the evaluation code, the generated images can be found in the folder under the model path. To evalute the generated images, the R-precision will be calculated automatically during the evaluation (Using the evaluation code from DM-GAN). For the IS and FID, we also use directly the evaluation script from DM-GAN.
References:
- StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [github]
- AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [github]
- DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis [github]