Home

Awesome

StylerDALLE

Official PyTorch implementation for ICCV 2023 paper StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model.

Updates

17 Jul 2023: StylerDALLE is accepted by ICCV 2023.

16 Aug 2023: StylerDALLE is also accepted for a presentation at the MMFM Workshop @ICCV, see you in Paris.

13 Sep 2023: Training code for StylerDALLE-1 and StylerDALLE-Ru released.

Setup:

Environment:

conda env create -f environment.yml
conda activate stylerdalle

Usage:

Data Pre-Processing

Before training, we preprocess the COCO dataset. Specifically, we encode the images into discrete tokens with a pretrained vector-quantized tokenizer. First of all, you shall download the images (train-2014, val-2014) and the annotations.

Preprocess for StylerDALLE-1 with the officially released dVAE of DALL-E: python prep/prep_coco.py

Preprocess for StylerDALLE-Ru with the VQGAN of Ru-DALLE: python prep/prep_coco_ru.py

In addition, for reinforcement learning, we prepare the caption data files here. They are derived from the original coco annotations but contain only successfully preprocessed image data.

Train StylerDALLE-1

Train StylerDALLE-Ru

Reference

@InProceedings{Xu2023StylerDALLE,
    author    = {Xu, Zipeng and Sangineto, Enver and Sebe, Nicu},
    title     = {StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {7601-7611}
}