Awesome
Word-Level Fine-Grained Story Visualization
Pytorch implementation for Word-Level Fine-Grained Story Visualization. The goal is to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters.
Overview
<img src="archi.jpg" width="940px" height="140px"/>Word-Level Fine-Grained Story Visualization.
Bowen Li, Thomas Lukasiewicz.<br> University of Oxford, TU Wien <br> ECCV 2022 <br>
Data
- Download Pororo dataset and extract the folder to
data/pororo
. - Download Abstract Scenes dataset and extract the folder to
data/abstract
.
Training
All code was developed and tested on CentOS 7 with Python 3.7 (Anaconda) and PyTorch 1.1.
Text Encoder Pretraining
- Please refer ControlGAN for more details about pretraining the text encoder. The text encoder pretraining is based on DAMSM, which maximizes the cosine similarity between text and image pairs provided by the corresponding dataset.
Our Model
- Train the model for Pororo dataset:
python main_pororo.py --cfg cfg/pororo.yml
- Train the model for Abstract dataset:
python main_abstract.py --cfg cfg/abstract.yml
*.yml
files include configuration for training and testing. If you store the datasets in somewhere else, please modify DATA_DIR
to point to the location.
Note that
we evaluate our approach at the resolution 64 × 64 on Pororo and 256×256 on Abstract Scenes, as Abstract Scenes provides larger-scale ground-truth images. To work on images at the resolution 256 × 256, we repeat the same upsampling blocks in the generator and downsampling blocks in the discriminator.
Pretrained Text Encoder
- Text Encoder for Pororo. Download and save it to
textEncoder/
. - Text Encoder for Abstract Scenes. Download and save it to
textEncoder/
.
Pretrained Our Model
Evaluation
- Run the following commands to evaluate our approach on the
Pororo
andAbstract Scenes
test dataset, including image generation of all stories in the test dataset, and calculation of both FID and FSD scores:
python main_pororo.py --cfg ./cfg/pororo.yml --eval_fid True
python main_abstract.py --cfg ./cfg/abstract.yml --eval_fid True
FID and FSD results will be saved in a .csv
file.
Code Structure
- cfg/: contains
*.yml
files. - datasets/: dataloader.
- main_pororo.py: the entry point for training and testing on Pororo.
- main_abstract.py: the entry point for training and testing on Abstract Scenes.
- trainer.py: creates the networks, harnesses and reports the progress of training.
- model.py: defines the architecture.
- inference.py: functions for evaluation.
- miscc/utils.py: loss functions and addtional help functions.
- miscc/config.py: creates the option list.
Citation
If you find this useful for your research, please use the following.
@article{li2022word,
title={Word-Level Fine-Grained Story Visualization},
author={Li, Bowen and Lukasiewicz, Thomas},
journal={arXiv preprint arXiv:2208.02341},
year={2022}
}
Acknowledgements
This code borrows from StoryGAN and ControlGAN repositories. Many thanks.