Home

Awesome

Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

The official implementation of work "Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models".

[Project Page] | [Arxiv] | [Video (Youtube)] | [视频 (Bilibili)] | [Huggingface Dataset]

Image-to-4D

demo_img_1 demo_img_2 demo_img_3

Text-to-4D

demo_text

3D-to-4D

3d_1 3d_2

News

4D Dataset Preparation

dataset_video

We collect a large-scale, high-quality dynamic 3D(4D) dataset sourced from the vast 3D data corpus of Objaverse-1.0 and Objaverse-XL. We apply a series of empirical rules to curate the source dataset. You can find more details in our paper. In this part, we will release the selected 4D assets, including:

  1. Curated high-quality 4D object ID.
  2. A render script using Blender, providing optional settings to render your personalized data.
  3. Rendered objaverse-1.0 4D images and Rendered objaverse-xl 4D images by our team to save you GPU time. With 8 GPUs and a total of 16 threads, it took 5.5 days to render the curated objaverse-1.0 dataset and about 30 days for objaverse-xl dataset.

4D Dataset ID/Metadata

We first collect 365k dynamic 3D assets from Objaverse-1.0 (42k) and Objaverse-xl (323k). Then we curate a high-quality subset to train our models.

The uncurated 42k IDs of all the animated objects from objaverse-1.0 are in rendering/src/ObjV1_all_animated.txt. The curated ~11k IDs of the animated objects from objaverse-1.0 are in rendering/src/ObjV1_curated.txt.

Metadata of animated objects (323k) from objaverse-xl can be found in huggingface. We also release the metadata of all successfully rendered objects from objaverse-xl's Github subset.

For text-to-4D generation, the captions are obtained from the work Cap3D.

4D Dataset Rendering Script

  1. Clone the repository and enter the rendering directory:
git clone https://github.com/VITA-Group/Diffusion4D.git && \
cd rendering
  1. Download Blender:
wget https://download.blender.org/release/Blender3.2/blender-3.2.2-linux-x64.tar.xz && \
tar -xf blender-3.2.2-linux-x64.tar.xz && \
rm blender-3.2.2-linux-x64.tar.xz
  1. Download 4D objects
pip install objaverse
python download.py --id_path src/sample.txt

Please change objaverse._VERSIONED_PATH in download.py to the path you prefer to store the glb files. By default, it will be downloaded to obj_v1/.

  1. Render 4D images
python render.py --obj_path "./obj_v1/glbs" \
                --save_dir './output' \
                --gpu_num 8           \
                --frame_num 24        \
                --azimuth_aug  1      \
                --elevation_aug 0     \
                --resolution 256      \
                --mode_multi 1        \
                --mode_static 1       \
                --mode_front_view 0   \
                --mode_four_view 0

Script Explanation:

Output Explanation:

├── output
│   | object1
│     ├── multi_frame0-23.png          #mode_multi outputs 
│     ├── multi0-23.json               #mode_multi cameras 
│
│     ├── multi_static_frame0-23.png   #mode_static outputs
│     ├── static0-23.json              #mode_static cameras 
│
│     # optional
│     ├── front_frame0-23.png                   #mode_front_view outputs
│     ├── front.json                            #mode_front_view cameras
│     ├── front/left/right/back_frame0-23.png   #mode_four_view outputs
│     ├── front/left/right/back.json            #mode_four_view cameras
│
│   | object2
│   ....
│   | object3
│   ....

Our rendering script is based on point-e and Objaverse rendering scripts. Thanks a lot to all the authors for sharing!

Other codes will be released soon!

Acknowledgement

This project is based on numerous outstanding research efforts and open-source contributions. We are deeply grateful to all the authors for their generosity in sharing their work!

If you find this repository/work/dataset helpful in your research, please consider citing the paper and starring the repo ⭐.

@article{liang2024diffusion4d,
  title={Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models},
  author={Liang, Hanwen and Yin, Yuyang and Xu, Dejia and Liang, Hanxue and Wang, Zhangyang and Plataniotis, Konstantinos N and Zhao, Yao and Wei, Yunchao},
  journal={arXiv preprint arXiv:2405.16645},
  year={2024}
}
<!-- ## Star History [![Star History Chart](https://api.star-history.com/svg?repos=VITA-Group/Diffusion4D&type=Date)](https://star-history.com/#VITA-Group/Diffusion4D&Date) -->