Home

Awesome

Word-Level Fine-Grained Story Visualization

Pytorch implementation for Word-Level Fine-Grained Story Visualization. The goal is to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters.

Overview

<img src="archi.jpg" width="940px" height="140px"/>

Word-Level Fine-Grained Story Visualization.
Bowen Li, Thomas Lukasiewicz.<br> University of Oxford, TU Wien <br> ECCV 2022 <br>

Data

  1. Download Pororo dataset and extract the folder to data/pororo.
  2. Download Abstract Scenes dataset and extract the folder to data/abstract.

Training

All code was developed and tested on CentOS 7 with Python 3.7 (Anaconda) and PyTorch 1.1.

Text Encoder Pretraining

Our Model

python main_pororo.py --cfg cfg/pororo.yml
python main_abstract.py --cfg cfg/abstract.yml

*.yml files include configuration for training and testing. If you store the datasets in somewhere else, please modify DATA_DIR to point to the location.

Note that we evaluate our approach at the resolution 64 × 64 on Pororo and 256×256 on Abstract Scenes, as Abstract Scenes provides larger-scale ground-truth images. To work on images at the resolution 256 × 256, we repeat the same upsampling blocks in the generator and downsampling blocks in the discriminator.

Pretrained Text Encoder

Pretrained Our Model

Evaluation

python main_pororo.py --cfg ./cfg/pororo.yml --eval_fid True
python main_abstract.py --cfg ./cfg/abstract.yml --eval_fid True

FID and FSD results will be saved in a .csv file.

Code Structure

Citation

If you find this useful for your research, please use the following.

@article{li2022word,
  title={Word-Level Fine-Grained Story Visualization},
  author={Li, Bowen and Lukasiewicz, Thomas},
  journal={arXiv preprint arXiv:2208.02341},
  year={2022}
}

Acknowledgements

This code borrows from StoryGAN and ControlGAN repositories. Many thanks.