Home

Awesome

<div align="center">

Q8 LTX-Video

This is the repository for Q8 LTX-Video.

Q8 Weights | Original repo |

</div>

Table of Contents

Introduction

LTX-VideoQ8 is 8bit adaptation of LTXVideo(https://github.com/Lightricks/LTX-Video) with no loss of accuracy and up to 3X speed up in NVIDIA ADA GPUs. Generate 720x480x121 videos in under a minute on RTX 4060 Laptop GPU with 8GB VRAM. Training code coming soon! (8GB VRAM is MORE than enough to full fine tune 2B transformer on ADA GPU with precalculated latents)

Benchmarks

40 steps, RTX 4060 Laptop, CUDA 12.6, PyTorch 2.5.1

Benchmarks

121x720x1280*: in diffusers the more steps it makes the slower it/sec gets, expected ~7min not 9mins according to it/sec of first 10 steps.

Run locally

Installation

The codebase was tested with Python 3.10.12, CUDA version 12.6, and supports PyTorch >= 2.5.1.

1) Install q8_kernels(https://github.com/KONAKONA666/q8_kernels)

2) git clone https://github.com/KONAKONA666/LTX-Video/tree/main
cd LTX-Video

python -m pip install -e .\[inference-script\]

Then, download the text encoder and vae from Hugging Face Download Q8 version or convert with q8_kernels.convert_weights

from huggingface_hub import snapshot_download

model_path = 'PATH'   # The local directory to save downloaded checkpoint
snapshot_download("konakona/ltxvideo_q8", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')

Inference

follow the inference code in inference.py:

For text-to-video generation:

python inference.py  --low_vram --transformer_type=q8_kernels --ckpt_dir  'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED

For image-to-video generation:

python inference.py --ckpt_dir 'PATH'  --low_vram --transformer_type=q8_kernels --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED

Comparision

Left: 8bit, right 16bit

Find side to side comparisons in

https://github.com/KONAKONA666/LTX-Video/tree/main/docs/_static

<!-- ![example1](./docs/_static/312661b4-974f-4db7-8e68-bc050debc782.gif) ![example2](./docs/_static/31632627-40ae-4dcf-aac9-99b70f908351.gif) ![example3](./docs/_static/62558328-6561-4486-9abe-4e13aa317577.gif) ![example4](./docs/_static/91d01bfa-e806-48b6-89b2-ed7a6733ac2f.gif) ![example5](./docs/_static/e37acb60-1f64-45b1-a8c1-4eff28af298a.gif) ![example5](./docs/_static/f989b225-8b82-4a2f-b119-91464803df95.gif) -->

Model User Guide

📝 Prompt Engineering

When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:

🎮 Parameter Guide

More to come...

Acknowledgement

We are grateful for the following awesome projects when implementing LTX-Video: