Home

Awesome

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ StyleCrafter on SDXL for stylized image generation is available! Enabling higher resolution(1024ร—1024) and more visually pleasing!

<div align="center">

<a href='https://arxiv.org/abs/2312.00330'><img src='https://img.shields.io/badge/arXiv-2312.00330-b31b1b.svg'></a> ย ย ย ย ย  <a href='https://gongyeliu.github.io/StyleCrafter.github.io/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> ย ย ย ย ย  <a href='https://huggingface.co/spaces/liuhuohuo/StyleCrafter'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> ย ย ย ย ย <br> <a href='https://github.com/GongyeLiu/StyleCrafter'><img src='https://img.shields.io/badge/StyleCrafter-VideoCrafter-darkcyan'></a> ย ย ย ย ย  <a href='https://github.com/GongyeLiu/StyleCrafter-SDXL'><img src='https://img.shields.io/badge/StyleCrafter-SDXL-darkcyan'></a> ย ย ย ย ย 

GongyeLiu, Menghan Xia*, Yong Zhang, Haoxin Chen, Jinbo Xing, <br>Xintao Wang, Yujiu Yang*, Ying Shan <br><br> (* corresponding authors)

From Tsinghua University and Tencent AI Lab.

</div>

๐Ÿ”† Introduction

TL;DR: We propose StyleCrafter, a generic method that enhances pre-trained T2V models with style control, supporting Style-Guided Text-to-Image Generation and Style-Guided Text-to-Video Generation. <br>

1. โญโญ Style-Guided Text-to-Video Generation.

<div align="center"> <img src=docs/showcase_1.gif> <p>Style-guided text-to-video results. Resolution: 320 x 512; Frames: 16. (Compressed)</p> </div>

2. Style-Guided Text-to-Image Generation.

<div align="center"> <img src=docs/showcase_img.jpeg> <p>Style-guided text-to-image results. Resolution: 512 x 512. (Compressed)</p> </div>

๐Ÿ“ Changelog

๐Ÿงฐ Models

Base ModelGen TypeResolutionCheckpointHow to run
VideoCrafterImage/Video320x512Hugging FaceStyleCrafter on VideoCrafter
SDXLImage1024x1024Hugging FaceStyleCrafter on SDXL

It takes approximately 5 seconds to generate a 512ร—512 image and 85 seconds to generate a 320ร—512 video with 16 frames using a single NVIDIA A100 (40G) GPU. A GPU with at least 16G GPU memory is required to perform the inference process.

โš™๏ธ Setup

conda create -n stylecrafter python=3.8.5
conda activate stylecrafter
pip install -r requirements.txt

๐Ÿ’ซ Inference

  1. Download all checkpoints according to the instructions
  2. Run the commands in terminal.
# style-guided text-to-image generation
sh scripts/run_infer_image.sh

# style-guided text-to-video generation
sh scripts/run_infer_video.sh
  1. (Optional) Infernce on your own data according to the instructions

๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Crafter Family

VideoCrafter1: Framework for high-quality text-to-video generation.

ScaleCrafter: Tuning-free method for high-resolution image/video generation.

TaleCrafter: An interactive story visualization tool that supports multiple characters.

LongerCrafter: Tuning-free method for longer high-quality video generation.

DynamiCrafter Animate open-domain still images to high-quality videos.

๐Ÿ“ข Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.


๐Ÿ™ Acknowledgements

We would like to thank AK(@_akhaliq) for the help of setting up online demo.

๐Ÿ“ญ Contact

If your have any comments or questions, feel free to contact lgy22@mails.tsinghua.edu.cn

BibTex

@article{liu2023stylecrafter,
  title={StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter},
  author={Liu, Gongye and Xia, Menghan and Zhang, Yong and Chen, Haoxin and Xing, Jinbo and Wang, Xintao and Yang, Yujiu and Shan, Ying},
  journal={arXiv preprint arXiv:2312.00330},
  year={2023}
}