Home

Awesome

Text Guided Style Transfer

This is my attempt at implementing Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer .

Motivation

I got interested in the possibility of manipulating pretrained text2img models for video edition. I googled CLIP-based Style Transfer and I stumbled upon this paper, which didn't have an open implementation so I decided to do it myself.

Setup project

Clone submodules:

git clone https://github.com/openai/CLIP
git clone https://github.com/ouhenio/guided-diffusion.git

Install submodules dependencies:

pip install -e ./CLIP & pip install -e ./guided-diffusion

Download the unconditional diffusion model (weights 2.06GB):

wget -O unconditional_diffusion.pt https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt

Usage

Sadly, the usage interface is pretty lacking:

python main.py

To try different styles, hyperparameters, and images, edit these lines in main.py:

139: guidance prompt
216: loss hyperparameters
155: initial image

Example Results

ImagePromptGlobal LossDirectional LossFeature LossMSE LossZeCon Loss
portraitNoneNoneNoneNoneNone
cubism200001500050300010
3d render in the style of Pixar5000500010010000500

Final thoughts

I've found that this method kinda works but it is very sensitive to hyperparams, which makes it frustrating to use.

Table 5 of the paper makes me confident that the authors had the same issue.