Home

Awesome

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

<a href='https://arxiv.org/abs/2312.04884'><img src='https://img.shields.io/badge/Arxiv-2312.04884-DF826C'></a> <a href='https://udifftext.github.io/'><img src='https://img.shields.io/badge/Project-UDiffText-D0F288'></a> <a href='https://huggingface.co/spaces/ZYMPKU/UDiffText'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Demo-UDiffText-8ADAB2'></a>

Our proposed UDiffText is capable of synthesizing accurate and harmonious text in either synthetic or real-word images, thus can be applied to tasks like scene text editing (a), arbitrary text generation (b) and accurate T2I generation (c)

UDiffText Teaser

๐Ÿ“ฌ News

๐Ÿ”จ Installation

  1. Clone this repo:
git clone https://github.com/ZYM-PKU/UDiffText.git
cd UDiffText
  1. Install required Python packages
conda create -n udiff python=3.11
conda activate udiff
pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
  1. Make the checkpoint directory and build the tree structure
mkdir ./checkpoints

checkpoints
โ”œโ”€โ”€ AEs                    // AutoEncoder
โ”œโ”€โ”€ encoders             
    โ”œโ”€โ”€ LabelEncoder       // Character-level encoder
    โ””โ”€โ”€ ViTSTR             // STR encoder
โ”œโ”€โ”€ predictors             // STR model
โ”œโ”€โ”€ pretrained             // Pretrained SD
โ””โ”€โ”€ ***.ckpt               // UDiffText checkpoint

๐Ÿ’ป Training

  1. Prepare your data

LAION-OCR

ICDAR13

ICDAR13
โ”œโ”€โ”€ train                  // training set
    โ”œโ”€โ”€ annos              // annotations
        โ”œโ”€โ”€ gt_x.txt
        โ”œโ”€โ”€ ...
    โ””โ”€โ”€ images             // images
        โ”œโ”€โ”€ img_x.jpg
        โ”œโ”€โ”€ ...
โ””โ”€โ”€ val                    // validation set
    โ”œโ”€โ”€ annos              // annotations
        โ”œโ”€โ”€ gt_img_x.txt
        โ”œโ”€โ”€ ...
    โ””โ”€โ”€ images             // images
        โ”œโ”€โ”€ img_x.jpg
        โ”œโ”€โ”€ ...

TextSeg

TextSeg
โ”œโ”€โ”€ train                  // training set
    โ”œโ”€โ”€ annotation         // annotations
        โ”œโ”€โ”€ x_anno.json    // annotation json file
        โ”œโ”€โ”€ x_mask.png     // character-level mask
        โ”œโ”€โ”€ ...
    โ””โ”€โ”€ image              // images
        โ”œโ”€โ”€ x.jpg.jpg
        โ”œโ”€โ”€ ...
โ””โ”€โ”€ val                    // validation set
    โ”œโ”€โ”€ annotation         // annotations
        โ”œโ”€โ”€ x_anno.json    // annotation json file
        โ”œโ”€โ”€ x_mask.png     // character-level mask
        โ”œโ”€โ”€ ...
    โ””โ”€โ”€ image              // images
        โ”œโ”€โ”€ x.jpg
        โ”œโ”€โ”€ ...

SynthText

SynthText
โ”œโ”€โ”€ 1                      // part 1
    โ”œโ”€โ”€ ant+hill_1_0.jpg   // image
    โ”œโ”€โ”€ ant+hill_1_1.jpg
    โ”œโ”€โ”€ ...
โ”œโ”€โ”€ 2                      // part 2
โ”œโ”€โ”€ ...
โ””โ”€โ”€ gt.mat                 // annotation file
  1. Train the character-level encoder

Set the parameters in ./configs/pretrain.yaml and run:

python pretrain.py
  1. Train the UDiffText model

Download the pretrained model and put it in ./checkpoints/pretrained/. You can ignore the "Missing Key" or "Unexcepted Key" warning when loading the checkpoint.

Set the parameters in ./configs/train.yaml, especially the paths:

load_ckpt_path: ./checkpoints/pretrained/512-inpainting-ema.ckpt // Checkpoint of the pretrained SD
model_cfg_path: ./configs/train/textdesign_sd_2.yaml // UDiffText model config
dataset_cfg_path: ./configs/dataset/locr.yaml // Use the Laion-OCR dataset

and run:

python train.py

๐Ÿ“ Evaluation

  1. Download our available checkpoints and put them in the corresponding directories in ./checkpoints.

  2. Set the parameters in ./configs/test.yaml, especially the paths:

load_ckpt_path: "./checkpoints/***.ckpt"  // UDiffText checkpoint
model_cfg_path: "./configs/test/textdesign_sd_2.yaml"  // UDiffText model config
dataset_cfg_path: "./configs/dataset/locr.yaml"  // LAION-OCR dataset config

and run:

python test.py

๐Ÿ–ผ๏ธ Demo

In order to run an interactive demo on your own machine, execute the code:

python demo.py

or try our online demo at hugging face:

Demo

๐ŸŽ‰ Acknowledgement

๐Ÿชฌ Citation

@misc{zhao2023udifftext,
      title={UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models}, 
      author={Yiming Zhao and Zhouhui Lian},
      year={2023},
      eprint={2312.04884},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}