Home

Awesome

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

NeurIPS ArXiv Website Demo

This repository contains our official implementation of the NeurIPS 2023 paper: DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models, which can generate high-quality vector sketches based on text prompts.

Our Project Page: https://ximinng.github.io/DiffSketcher-project/

teaser1 teaser2

DiffSketcher Rendering Process:

<img src="./img/0.gif" style="width: 200px; height: 200px;"><img src="./img/1.gif" style="width: 200px; height: 200px;"><img src="./img/2.gif" style="width: 200px; height: 200px;">
Prompt: Macaw full color, ultra detailed, realistic, insanely beautifulPrompt: Very detailed masterpiece painting of baby yoda hoding a lightsaberPrompt: Sailboat sailing in the sea on a clear day

:new: Update

:wrench: Installation

Step by step

Create a new conda environment:

conda create --name diffsketcher python=3.10
conda activate diffsketcher

Install pytorch and the following libraries:

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
pip install omegaconf BeautifulSoup4
pip install opencv-python scikit-image matplotlib visdom wandb
pip install triton numba
pip install numpy scipy timm scikit-fmm einops
pip install accelerate transformers safetensors datasets

Install CLIP:

pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

Install diffusers:

pip install diffusers==0.20.2

Install xformers (require python=3.10):

conda install xformers -c xformers

Install diffvg:

git clone https://github.com/BachiLi/diffvg.git
cd diffvg
git submodule update --init --recursive
conda install -y -c anaconda cmake
conda install -y -c conda-forge ffmpeg
pip install svgwrite svgpathtools cssutils torch-tools
python setup.py install

Docker Usage

docker run --name diffsketcher --gpus all -it --ipc=host ximingxing/svgrender:v1 /bin/bash

🔥 Quickstart

Case: Sydney Opera House

Preview:

Attention MapControl Points InitStrokes Initialization100 step500 step
<img src="./img/SydneyOperaHouse/attn-map.png" style="width: 200px; height: 200px;"><img src="./img/SydneyOperaHouse/points-init.png" style="width: 200px; height: 200px;"><img src="./img/SydneyOperaHouse/svg_iter0.svg" style="width: 200px; height: 200px;"><img src="./img/SydneyOperaHouse/svg_iter100.svg" style="width: 200px; height: 200px;"><img src="./img/SydneyOperaHouse/visual_best_96P.svg" style="width: 200px; height: 200px;">

From the abstract to the concrete:

16 Paths36 Paths48 Paths96 Paths128 Paths
<img src="./img/SydneyOperaHouse/visual_best_16P.svg"><img src="./img/SydneyOperaHouse/visual_best_36P.svg"><img src="./img/SydneyOperaHouse/visual_best_48P.svg"><img src="./img/SydneyOperaHouse/visual_best_96P.svg"><img src="./img/SydneyOperaHouse/visual_best_128P.svg">

Script:

python run_painterly_render.py \
  -c diffsketcher.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=4 num_paths=96 num_iter=800" \
  -pt "a photo of Sydney opera house" \
  -respath ./workdir/sydney_opera_house \
  -d 8019 \
  --download

crucial:

optional:

Case: Sydney Opera House in ink painting style

Preview:

<img src="./img/SydneyOperaHouse-ink/svg_iter0.svg"><img src="./img/SydneyOperaHouse-ink/svg_iter100.svg"><img src="./img/SydneyOperaHouse-ink/svg_iter200.svg"><img src="./img/SydneyOperaHouse-ink/visual_best.svg">
Strokes Initialization100 step200 step990 step

Script:

python run_painterly_render.py \
  -c diffsketcher-width.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=4 num_paths=48 num_iter=800" \
  -pt "a photo of Sydney opera house" \
  -respath ./workdir/sydney_opera_house_ink \
  -d 8019 \
  --download

Oil Painting

Preview:

<img src="./img/LatinWomanPortrait/svg_iter0.svg"><img src="./img/LatinWomanPortrait/svg_iter100.svg"><img src="./img/LatinWomanPortrait/visual_best.svg">
Strokes Initialization100 step570 step

Script:

python run_painterly_render.py \
  -c diffsketcher-color.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=5 num_paths=1000 num_iter=1000 guidance_scale=7.5" \
  -pt "portrait of latin woman having a spiritual awaking, eyes closed, slight smile, illuminating lights, oil painting, by Van Gogh" \
  -npt "text, signature, title, heading, watermark, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, out of frame, ugly, extra limbs, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, mutated hands, fused fingers, too many fingers, long neck" \
  -respath ./workdir/latin_woman_portrait -d 58548

Preview:

<img src="./img/WomanWithCrown/svg_iter0.svg"><img src="./img/WomanWithCrown/svg_iter100.svg"><img src="./img/WomanWithCrown/visual_best.svg">
Strokes Initialization100 step570 step

Script:

python run_painterly_render.py \
  -c diffsketcher-color.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=5 num_paths=1000 num_iter=1000 guidance_scale=7.5" \
  -pt "a painting of a woman with a crown on her head, art station front page, dynamic portrait style, many colors in the background, olpntng style, oil painting, forbidden beauty" \
  -npt "2 heads, 2 faces, cropped image, out of frame, draft, deformed hands, twisted fingers, double image, malformed hands, multiple heads, extra limb, ugly, poorly drawn hands, missing limb, disfigured, cut-off, ugly, grain, low-res, Deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, floating limbs, disconnected limbs, disgusting, poorly drawn, mutilated, mangled, extra fingers, duplicate artifacts, morbid, gross proportions, missing arms, mutated hands, mutilated hands, cloned face, malformed, blur haze" \
  -respath ./workdir/woman_with_crown -d 178351

Preview:

<img src="./img/BeautifulGirl_OilPainting/svg_iter0.svg"><img src="./img/BeautifulGirl_OilPainting/svg_iter100.svg"><img src="./img/BeautifulGirl_OilPainting/visual_best.svg">
Strokes Initialization100 step420 step

Script:

python run_painterly_render.py \
  -c diffsketcher-color.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=5 num_paths=1000 num_iter=1000 guidance_scale=7.5" \
  -pt "a painting of a woman with a crown on her head, art station front page, dynamic portrait style, many colors in the background, olpntng style, oil painting, forbidden beauty" \
  -npt "2 heads, 2 faces, cropped image, out of frame, draft, deformed hands, twisted fingers, double image, malformed hands, multiple heads, extra limb, ugly, poorly drawn hands, missing limb, disfigured, cut-off, ugly, grain, low-res, Deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, floating limbs, disconnected limbs, disgusting, poorly drawn, mutilated, mangled, extra fingers, duplicate artifacts, morbid, gross proportions, missing arms, mutated hands, mutilated hands, cloned face, malformed, blur haze" \
  -respath ./workdir/woman_with_crown -d 178351

Colorful Results

Preview:

<img src="./img/castle-rgba/svg_iter0.svg"><img src="./img/castle-rgba/svg_iter100.svg"><img src="./img/castle-rgba/visual_best.svg">
Strokes Initialization100 step340 step

Script:

python run_painterly_render.py \
  -c diffsketcher-color.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=5 num_paths=1000 num_iter=800 guidance_scale=7" \
  -pt "a beautiful snow-covered castle, a stunning masterpiece, trees, rays of the sun, Leonid Afremov" \
  -npt "poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face" \
  -respath ./workdir/castle -d 370880

Preview:

<img src="./img/castle-rgba-2/svg_iter0.svg"><img src="./img/castle-rgba-2/svg_iter100.svg"><img src="./img/castle-rgba-2/visual_best.svg">
Strokes Initialization100 step850 step

Script:

python run_painterly_render.py \
  -c diffsketcher-color.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=5 num_paths=1000 num_iter=800 guidance_scale=7" \
  -pt "a beautiful snow-covered castle, a stunning masterpiece, trees, rays of the sun, Leonid Afremov" \
  -npt "poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face" \
  -respath ./workdir/castle -d 478376

DiffSketcher + Style Transfer

Preview:

<img src="./img/FrenchRevolution-ST/generated.png" style="width: 250px; height: 250px;"><img src="./img/starry.jpg" style="width: 250px; height: 250px;"><img src="./img/FrenchRevolution-ST/french_ST.svg" style="width: 250px; height: 250px;">
Generated sampleStyle ImageResult

Script:

python run_painterly_render.py \
  -tk style-diffsketcher -c diffsketcher-style.yaml \
  -eval_step 10 -save_step 10 \
  -update "token_ind=4 num_paths=2000 style_warmup=0 style_strength=1 softmax_temp=0.4 sds.grad_scale=0 lr_scheduler=True num_iter=2000" \
  -pt "The French Revolution, highly detailed, 8k, ornate, intricate, cinematic, dehazed, atmospheric, oil painting, by Van Gogh" \
  -style ./img/starry.jpg \
  -respath ./workdir/style_transfer \
  -d 876809

More Sketch Results

check the Examples.md for more cases.

TODO

:books: Acknowledgement

The project is built based on the following repository:

We gratefully thank the authors for their wonderful works.

:paperclip: Citation

If you use this code for your research, please cite the following work:

@inproceedings{xing2023diffsketcher,
    title={DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models},
    author={XiMing Xing and Chuang Wang and Haitao Zhou and Jing Zhang and Qian Yu and Dong Xu},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
    url={https://openreview.net/forum?id=CY1xatvEQj}
}

:copyright: Licence

This work is licensed under a MIT License.