Home

Awesome

StyleCrafter-SDXL

<div align="center">

<a href='https://arxiv.org/abs/2312.00330'><img src='https://img.shields.io/badge/arXiv-2312.00330-b31b1b.svg'></a>       <a href='https://gongyeliu.github.io/StyleCrafter.github.io/'><img src='https://img.shields.io/badge/Project-Page-Green'></a>       <a href='https://huggingface.co/spaces/liuhuohuo/StyleCrafter'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a>      <br> <a href='https://github.com/GongyeLiu/StyleCrafter'><img src='https://img.shields.io/badge/StyleCrafter-VideoCrafter-darkcyan'></a>       <a href='https://github.com/GongyeLiu/StyleCrafter-SDXL'><img src='https://img.shields.io/badge/StyleCrafter-SDXL-darkcyan'></a>      

</div>

🔆 Introduction

Hi, this is an official implementation of StyleCrafter in SDXL We train StyleCrafter on SDXL to further enhance its generated quality for style-guided image generation.

TL;DR: Higher Resolution(1024×1024)! More Visually Pleasing!

⭐ Showcases

<div align="center"> <img src=.asset/teaser.png> <p>Style-guided text-to-image results. Resolution: 1024 x 1024. (Compressed)</p> </div>

⚙️ Setup

Step 1: Install Python Environment

conda create -n style_crafter python=3.9
conda activate style_crafter

conda install cudatoolkit=11.8 cudnn

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install diffusers==0.29.1
pip install accelerate==0.31.0
pip install transformers tensorboard omegaconf opencv-python webdataset

Step 2: Download checkpoints

Download StyleCrafter-SDXL checkpoints from huggingface, and put them into the folder ./pretrained_ckpts/.

After downloading and moving, the directiry structure should look like this:

pretrained_ckpts
├── image_encoder
│   ├── config.json
│   └── pytorch_model.bin
└── stylecrafter
    └── stylecrafter_sdxl.ckpt

💫 Inference

Run the following command to generate stylized videos.

python infer.py --style_dir testing_data/input_style \
  --prompts_file testing_data/prompts.txt \
  --save_dir testing_data/output \
  --scale 0.5

If you find unsatisfactory results, try slightly adjusting the scale value. Empirically, reduce the scale if it produces artifacts, and increase the scale if result is less stylized.

💥 Training

  1. Prepare your own training data as webdataset style, or just modified dataset.py to adapted to your data as preferred.

  2. launch the training bash(based on accelerate)

sh train.sh

📝 Training Details

As a reference, we train StyleCrafter-SDXL as the following steps:

We conduct all the training processes on 8 Nvidia A100 GPUs, which takes about a week to complete. Just approximation.

For more details(model arch, data process, etc.), please refer to our paper:

🧰 More about StyleCrafter

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter </br> GongyeLiu, Menghan Xia*, Yong Zhang, Haoxin Chen, Jinbo Xing, Xintao Wang,
Ying Shan Yujiu Yang* <br> (* corresponding authors)

<br>

StyleCrafter Github Repo(based on VideoCrafter)

<br>

StyleCrafter Homepage

📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.


🙏 Acknowledgements

This repo is based on diffusers and accelerate, and our training code for SDXL is largely modified from IP-Adapter. We would like to thank them for their awesome contributions to the AIGC community.

📭 Contact

If your have any comments or questions, feel free to contact lgy22@mails.tsinghua.edu.cn

BibTex

@article{liu2023stylecrafter,
  title={StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter},
  author={Liu, Gongye and Xia, Menghan and Zhang, Yong and Chen, Haoxin and Xing, Jinbo and Wang, Xintao and Yang, Yujiu and Shan, Ying},
  journal={arXiv preprint arXiv:2312.00330},
  year={2023}
}