Home

Awesome

Modality Disentangled Discriminator for Text-to-Image Synthesis

Introduction

This project page provides pytorch code that implements the paper: "Modality Disentangled Discriminator for Text-to-Image Synthesis".

How to use

Python

Data

  1. Download metadata for birds coco and save them to data/

    • python google_drive.py 1O_LtUP9sch09QH3s_EBAgLEctBQ5JBSJ ./data/bird.zip
    • python google_drive.py 1rSnbIGNDGZeHlsUlLdahj0RJ9oo6lgH9 ./data/coco.zip
  2. Download the birds image data. Extract them to data/birds/

    • cd data/birds
    • wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
    • tar -xvzf CUB_200_2011.tgz
  3. Download coco dataset and extract the images to data/coco/

    • cd data/coco
    • wget http://images.cocodataset.org/zips/train2014.zip
    • wget http://images.cocodataset.org/zips/val2014.zip
    • unzip train2014.zip
    • unzip val2014.zip
    • mv train2014 images
    • cp val2014/* images

Pretrained Models

Training

Validation

  1. Images generation:
    • go into code/ folder
    • python main.py --cfg cfg/eval_bird_DMGANMDD.yml --gpu 0
    • python main.py --cfg cfg/eval_coco_DMGANMDD.yml --gpu 0
  2. Inception score (IS for bird, IS for coco):
    • cd DM-GAN-MDD/eval/IS/bird && CUDA_VISIBLE_DEVICES=0 python inception_score_bird.py --image_folder ../../../models/netG_DMGANMDD_bird
    • cd DM-GAN-MDD/eval/IS/coco && CUDA_VISIBLE_DEVICES=0 python inception_score_coco.py ../../../models/netG_DMGANMDD_coco
  3. FID:
    • cd DM-GAN-MDD/eval/FID && python fid_score.py --gpu 0 --path1 bird_val.npz --path2 ../../models/netG_DMGANMDD_bird
    • cd DM-GAN-MDD/eval/FID && python fid_score.py --gpu 0 --path1 coco_val.npz --path2 ../../models/netG_DMGANMDD_coco

Performance

As DM-GAN, we use the Pytorch implementation to measure FID score.

ModelR-precision↑IS↑Pytorch FID
bird_AttnGAN (paper)67.82% ± 4.43%4.36 ± 0.0323.98
bird_DMGAN (paper)72.31% ± 0.91%4.75 ± 0.0716.09
bird_DMGAN_MDD79.73% ± 0.68%4.86 ± 0.0615.76
coco_AttnGAN (paper)85.47% ± 3.69%25.89 ± 0.4735.49
coco_DMGAN (paper)88.56% ± 0.28%30.49 ± 0.5732.64
coco_DMGAN_MDD94.37% ± 0.36%34.46 ± 0.7224.30

License

This code is released under the MIT License (refer to the LICENSE file for details).