Awesome

Word-Level Fine-Grained Story Visualization

Pytorch implementation for Word-Level Fine-Grained Story Visualization. The goal is to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters.

Overview

Word-Level Fine-Grained Story Visualization.
Bowen Li, Thomas Lukasiewicz.<br> University of Oxford, TU Wien <br> ECCV 2022 <br>

Data

Download Pororo dataset and extract the folder to data/pororo.
Download Abstract Scenes dataset and extract the folder to data/abstract.

Training

All code was developed and tested on CentOS 7 with Python 3.7 (Anaconda) and PyTorch 1.1.

Text Encoder Pretraining

Please refer ControlGAN for more details about pretraining the text encoder. The text encoder pretraining is based on DAMSM, which maximizes the cosine similarity between text and image pairs provided by the corresponding dataset.

Our Model

Train the model for Pororo dataset:

python main_pororo.py --cfg cfg/pororo.yml

Train the model for Abstract dataset:

python main_abstract.py --cfg cfg/abstract.yml

*.yml files include configuration for training and testing. If you store the datasets in somewhere else, please modify DATA_DIR to point to the location.

Note that we evaluate our approach at the resolution 64 × 64 on Pororo and 256×256 on Abstract Scenes, as Abstract Scenes provides larger-scale ground-truth images. To work on images at the resolution 256 × 256, we repeat the same upsampling blocks in the generator and downsampling blocks in the discriminator.

Pretrained Text Encoder

Text Encoder for Pororo. Download and save it to textEncoder/.
Text Encoder for Abstract Scenes. Download and save it to textEncoder/.

Pretrained Our Model

Pororo. Download and save it to models/.
Abstract. Download and save it to models/.

Evaluation

Run the following commands to evaluate our approach on the Pororo and Abstract Scenes test dataset, including image generation of all stories in the test dataset, and calculation of both FID and FSD scores:

python main_pororo.py --cfg ./cfg/pororo.yml --eval_fid True

python main_abstract.py --cfg ./cfg/abstract.yml --eval_fid True

FID and FSD results will be saved in a .csv file.

Code Structure

cfg/: contains *.yml files.
datasets/: dataloader.
main_pororo.py: the entry point for training and testing on Pororo.
main_abstract.py: the entry point for training and testing on Abstract Scenes.
trainer.py: creates the networks, harnesses and reports the progress of training.
model.py: defines the architecture.
inference.py: functions for evaluation.
miscc/utils.py: loss functions and addtional help functions.
miscc/config.py: creates the option list.

Citation

If you find this useful for your research, please use the following.

@article{li2022word,
  title={Word-Level Fine-Grained Story Visualization},
  author={Li, Bowen and Lukasiewicz, Thomas},
  journal={arXiv preprint arXiv:2208.02341},
  year={2022}
}

Acknowledgements

This code borrows from StoryGAN and ControlGAN repositories. Many thanks.