Awesome
DiffSynth Studio
<p align="center"> <a href="https://trendshift.io/repositories/10946" target="_blank"><img src="https://trendshift.io/api/badge/repositories/10946" alt="modelscope%2FDiffSynth-Studio | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p>Document: https://diffsynth-studio.readthedocs.io/zh-cn/latest/index.html
Introduction
DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!
Until now, DiffSynth Studio has supported the following models:
- HunyuanVideo
- CogVideoX
- FLUX
- ExVideo
- Kolors
- Stable Diffusion 3
- Stable Video Diffusion
- Hunyuan-DiT
- RIFE
- ESRGAN
- Ip-Adapter
- AnimateDiff
- ControlNet
- Stable Diffusion XL
- Stable Diffusion
News
-
December 19, 2024 We implement advanced VRAM management for HunyuanVideo, making it possible to generate videos at a resolution of 129x720x1280 using 24GB of VRAM, or at 129x512x384 resolution with just 6GB of VRAM. Please refer to ./examples/HunyuanVideo/ for more details.
-
December 18, 2024 We propose ArtAug, an approach designed to improve text-to-image synthesis models through synthesis-understanding interactions. We have trained an ArtAug enhancement module for FLUX.1-dev in the format of LoRA. This model integrates the aesthetic understanding of Qwen2-VL-72B into FLUX.1-dev, leading to an improvement in the quality of generated images.
- Paper: https://arxiv.org/abs/2412.12888
- Examples: https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/ArtAug
- Model: ModelScope, HuggingFace
- Demo: ModelScope, HuggingFace (Coming soon)
-
October 25, 2024 We provide extensive FLUX ControlNet support. This project supports many different ControlNet models that can be freely combined, even if their structures differ. Additionally, ControlNet models are compatible with high-resolution refinement and partition control techniques, enabling very powerful controllable image generation. See
./examples/ControlNet/
. -
October 8, 2024. We release the extended LoRA based on CogVideoX-5B and ExVideo. You can download this model from ModelScope or HuggingFace.
-
August 22, 2024. CogVideoX-5B is supported in this project. See here. We provide several interesting features for this text-to-video model, including
- Text to video
- Video editing
- Self-upscaling
- Video interpolation
-
August 22, 2024. We have implemented an interesting painter that supports all text-to-image models. Now you can create stunning images using the painter, with assistance from AI!
- Use it in our WebUI.
-
August 21, 2024. FLUX is supported in DiffSynth-Studio.
- Enable CFG and highres-fix to improve visual quality. See here
- LoRA, ControlNet, and additional models will be available soon.
-
June 21, 2024. 🔥🔥🔥 We propose ExVideo, a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.
- Project Page
- Source code is released in this repo. See
examples/ExVideo
. - Models are released on HuggingFace and ModelScope.
- Technical report is released on arXiv.
- You can try ExVideo in this Demo!
-
June 13, 2024. DiffSynth Studio is transferred to ModelScope. The developers have transitioned from "I" to "we". Of course, I will still participate in development and maintenance.
-
Jan 29, 2024. We propose Diffutoon, a fantastic solution for toon shading.
- Project Page
- The source codes are released in this project.
- The technical report (IJCAI 2024) is released on arXiv.
-
Dec 8, 2023. We decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis. The development of this project is started.
-
Nov 15, 2023. We propose FastBlend, a powerful video deflickering algorithm.
-
Oct 1, 2023. We release an early version of this project, namely FastSDXL. A try for building a diffusion engine.
- The source codes are released on GitHub.
- FastSDXL includes a trainable OLSS scheduler for efficiency improvement.
-
Aug 29, 2023. We propose DiffSynth, a video synthesis framework.
- Project Page.
- The source codes are released in EasyNLP.
- The technical report (ECML PKDD 2024) is released on arXiv.
Installation
Install from source code (recommended):
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
Or install from pypi:
pip install diffsynth
Usage (in Python code)
The Python examples are in examples
. We provide an overview here.
Download Models
Download the pre-set models. Model IDs can be found in config file.
from diffsynth import download_models
download_models(["FLUX.1-dev", "Kolors"])
Download your own models.
from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope
# From Modelscope (recommended)
download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae")
# From Huggingface
download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae")
Video Synthesis
Text-to-video using CogVideoX-5B
CogVideoX-5B is released by ZhiPu. We provide an improved pipeline, supporting text-to-video, video editing, self-upscaling and video interpolation. examples/video_synthesis
The video on the left is generated using the original text-to-video pipeline, while the video on the right is the result after editing and frame interpolation.
https://github.com/user-attachments/assets/26b044c1-4a60-44a4-842f-627ff289d006
Long Video Synthesis
We trained extended video synthesis models, which can generate 128 frames. examples/ExVideo
https://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc
https://github.com/user-attachments/assets/321ee04b-8c17-479e-8a95-8cbcf21f8d7e
Toon Shading
Render realistic videos in a flatten style and enable video editing features. examples/Diffutoon
https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd
https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c
Video Stylization
Video stylization without video models. examples/diffsynth
https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea
Image Synthesis
Generate high-resolution images, by breaking the limitation of diffusion models! examples/image_synthesis
.
LoRA fine-tuning is supported in examples/train
.
FLUX | Stable Diffusion 3 |
---|---|
Kolors | Hunyuan-DiT |
---|---|
Stable Diffusion | Stable Diffusion XL |
---|---|
Usage (in WebUI)
Create stunning images using the painter, with assistance from AI!
https://github.com/user-attachments/assets/95265d21-cdd6-4125-a7cb-9fbcf6ceb7b0
This video is not rendered in real-time.
Before launching the WebUI, please download models to the folder ./models
. See here.
Gradio
version
pip install gradio
python apps/gradio/DiffSynth_Studio.py
Streamlit
version
pip install streamlit streamlit-drawable-canvas
python -m streamlit run apps/streamlit/DiffSynth_Studio.py
https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/93085557-73f3-4eee-a205-9829591ef954