Awesome

Q8 LTX-Video

This is the repository for Q8 LTX-Video.

</div>

Introduction
Model User Guide
Acknowledgement

Introduction

LTX-VideoQ8 is 8bit adaptation of LTXVideo(https://github.com/Lightricks/LTX-Video) with no loss of accuracy and up to 3X speed up in NVIDIA ADA GPUs. Generate 720x480x121 videos in under a minute on RTX 4060 Laptop GPU with 8GB VRAM. Training code coming soon! (8GB VRAM is MORE than enough to full fine tune 2B transformer on ADA GPU with precalculated latents)

Benchmarks

40 steps, RTX 4060 Laptop, CUDA 12.6, PyTorch 2.5.1

Benchmarks

121x720x1280*: in diffusers the more steps it makes the slower it/sec gets, expected ~7min not 9mins according to it/sec of first 10 steps.

Run locally

Installation

The codebase was tested with Python 3.10.12, CUDA version 12.6, and supports PyTorch >= 2.5.1.

1) Install q8_kernels(https://github.com/KONAKONA666/q8_kernels)

2) git clone https://github.com/KONAKONA666/LTX-Video/tree/main
cd LTX-Video

python -m pip install -e .\[inference-script\]

Then, download the text encoder and vae from Hugging Face Download Q8 version or convert with q8_kernels.convert_weights

from huggingface_hub import snapshot_download

model_path = 'PATH'   # The local directory to save downloaded checkpoint
snapshot_download("konakona/ltxvideo_q8", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')

Inference

follow the inference code in inference.py:

For text-to-video generation:

python inference.py  --low_vram --transformer_type=q8_kernels --ckpt_dir  'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED

For image-to-video generation:

python inference.py --ckpt_dir 'PATH'  --low_vram --transformer_type=q8_kernels --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED

Comparision

Left: 8bit, right 16bit

Find side to side comparisons in

https://github.com/KONAKONA666/LTX-Video/tree/main/docs/_static

Model User Guide

📝 Prompt Engineering

When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:

Start with main action in a single sentence
Add specific details about movements and gestures
Describe character/object appearances precisely
Include background and environment details
Specify camera angles and movements
Describe lighting and colors
Note any changes or sudden events
See examples for more inspiration.

🎮 Parameter Guide

Resolution Preset: Higher resolutions for detailed scenes, lower for faster generation and simpler scenes. The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames. The model works best on resolutions under 720 x 1280 and number of frames below 257
Seed: Save seed values to recreate specific styles or compositions you like
Guidance Scale: 3-3.5 are the recommended values
Inference Steps: More steps (40+) for quality, fewer steps (20-30) for speed

More to come...

Acknowledgement

We are grateful for the following awesome projects when implementing LTX-Video:

DiT and PixArt-alpha: vision transformers for image generation.
Lightricks for the model

Awesome

Q8 LTX-Video

Table of Contents

Introduction

Benchmarks

Run locally

Installation

Inference

For text-to-video generation:

For image-to-video generation:

Comparision

Model User Guide

📝 Prompt Engineering

🎮 Parameter Guide

More to come...

Acknowledgement