Home

Awesome

VIDGEN-1M

VIDGEN-1M: A LARGE-SCALE DATASET FOR TEXT-TO-VIDEO GENERATION

arXiv Project Page

Introduction

we present VidGen-1M, a superior training dataset for text-to-video models. Produced through a coarse-to-fine curation strategy, this dataset guarantees high-quality videos and detailed captions with excellent temporal consistency. We trained a video generation model using this data and open-source the model.

News

Contents

Install

  1. Clone this repository
  2. Install Package
conda create -n vidgen python=3.10
conda activate vidgen

pip install torch==2.2.2 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tqdm einops omegaconf bigmodelvis deepspeed tensorboard timm==0.9.16 ninja opencv-python opencv-python-headless ftfy bs4 beartype colossalai accelerate ultralytics webdataset

pip install -U xformers --index-url https://download.pytorch.org/whl/cu118

VidGen-1M Datasets

To assist the community in researching and learning about video generation, we have made public VidGen-1M high-quality video data.

Model Weights

Please download the Model weight from huggingface.

Sampling

You can use a single GPU or multiple GPUs for inference. The script has various arguments.

bash scripts/sample_t2v.sh

Citation

@article{tan2024vdgen-1m,
  title={VIDGEN-1M: A LARGE-SCALE DATASET FOR TEXTTO-VIDEO GENERATION},
  author={Tan, Zhiyu and Yang, Xiaomeng and Qin, Luozheng and Li, Hao},
  journal={arXiv preprint arXiv:2408.02629},
  year={2024},
  institution={Fudan University and Shanghai Academy of AI for Science},
}