Awesome

<p align="center"> <br> <img src="https://modelscope.oss-cn-beijing.aliyuncs.com/modelscope.gif" width="400"/> <br> <h1>Normal-Depth Diffusion Model</h1> <p>

Normal-Depth Diffusion Model: A Generalizable Normal-Depth Diffusion Model.

如果您熟悉中文，可以阅读中文版本的README。

Text-to-ND

teaser-nd

Text-to-ND-MV

Project page | Paper | YouTube

Inference code.
Training code.
Pretrained model: ND, ND-MV, Albedo-MV.
Pretrained model: ND-MV-VAE.
Rendered Multi-View Image of Objaverse-dataset.

News

2023-12-25: We release the training dataset mvs_objaverse through Alibaba OSS Service. We also provide a convenient multi-threads script for fast downloading.
2023-12-11: Inference codes and pretrained models are released. We are working to improve ND-Diffusion Model, stay tuned!.

3D Generation

This repository only includes the diffusion model and 2D image generation code of RichDreamer paper.
For 3D Generation, please check RichDreamer.

Preparation for inference

Install requirements using following scripts.

conda create -n nd
conda activate nd 
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/CompVis/taming-transformers.git
pip install webdataset
pip install img2dataset

we also provide a dockerfile to build docker image.

sudo docker build -t mv3dengine_22.04:cu118 -f docker/Dockerfile .

Download pretrained weights.

ND: Normal-Depth Diffusion trained on Laion-2B
ND-MV: MultiView Normal-Depth Diffusion Model
Alebdo-MV: MultiView Depth-conditioned Albedo Diffusion Model

we also provide a script for download.

python tools/download_models/download_nd_models.py

Inference (Sampling)

we provide a script for sampling

sh demo_inference.sh

Or use the following detailed instructions:

Text2ND sampling

# dmp solver
python ./scripts/t2i.py --ckpt $ckpt_path --prompt $prompt --dpm_solver --n_samples 2 --save_dir $save_dir
# plms solver
python ./scripts/t2i.py --ckpt $ckpt_path --prompt $prompt --plms --n_samples 2  --save_dir $save_dir
# ddim solver
python ./scripts/t2i.py --ckpt $ckpt_path --prompt $prompt --n_samples 2  --save_dir $save_dir

Text2ND-MV sampling

# nd-mv
python ./scripts/t2i_mv.py --ckpt_path $ckpt_path --prompt $prompt  --num_frames 4  --model_name nd-mv --save_dir $save_dir

# nd-mv with VAE (coming soon)
python ./scripts/t2i_mv.py --ckpt_path $ckpt_path --prompt $prompt  --num_frames 4  --model_name nd-mv-vae --save_dir $save_dir

Text2Albedo-MV sampling

python ./scripts/td2i_mv.py --ckpt_path $ckpt_path --prompt $prompt --depth_file $ depth_file --num_frames 4  --model_name albedo-mv --save_dir $save_dir

Preparation for training

Download Laion-2B-en-5-AES (Required to train ND model)

Download laion-2b dataset from parquet Then, put parquet files into ./laion2b-dataset-5-aes

cd ./tools/download_dataset
bash ./download_2b-5_aes.sh
cd -

Download Monocular Prior Models' Weight (Required to train ND model)

NormalBae scannet.pt
Midas3.1 dpt_beit_large512.pt

# move the scannet.pt to normalbae Prior Model
mv scannet.pt ./libs/ControlNet-v1-1-nightly/annotator/normalbae/scannet.pt
# move the dpt_beit_large512.pt to ./libs/omnidata_torch/pretrained_models/dpt_beit_large_512.pt
mv dpt_beit_large512.pt ./libs/omnidata_torch/pretrained_models/dpt_beit_large_512.pt

Download rendered Multi-View image of Objaverse-dataset (Required to train ND-MV and Albedo-MV model)

Download our rendered dataset using the prepared script

wget https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/aigc3d/valid_paths_v4_cap_filter_thres_28.json
# Example: python ./scripts/data/download_objaverse.py ./mvs_objaverse ./valid_paths_v4_cap_filter_thres_28.json 50
python ./scripts/data/download_objaverse.py /path/to/savedata /path/to/valid_paths_v4_cap_filter_thres_28.json nthreads(eg. 10)
# set up a link if you save data anywhere
ln -s /path/to/savedata mvs_objaverse
# caption file
wget https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/aigc3d/text_captions_cap3d.json

Training

Training Normal-Depth-VAE Model

Download pretrained-VAE weights pretrained on ImageNet.
Modify the config file in configs/autoencoder_normal_depth/autoencoder_normal_depth.yaml, set model.ckpt_path=/path/to/pretained-VAE weights

# training  VAE datasets
bash ./scripts/train_vae/train_nd_vae/train_rgbd_vae_webdatasets.sh \ model.params.ckpt_path=${pretained-VAE weights} \
data.params.train.params.curls='path_laion/{00000..${:5 end_id}}.tar' \
--gpus 0,1,2,3,4,5,6,7

Training Normal-Depth-Diffusion Model

After training and get Normal-Depth-VAE Model or you could download it from ND-VAE

# step 1
export SD-MODEL-PATH=/path/to/sd-1.5
bash scripts/train_normald_sd/txt_cond/web_datasets/train_normald_webdatasets.sh --gpus 0,1,2,3,4,5,6,7 \
    model.params.first_stage_ckpts=${Normal-Depth-VAE} model.params.ckpt_path=${SD-MODEL-PATH} \
    data.params.train.params.curls='path_laion/{00000..${:5 end_id}}.tar'

# step 2 modify your step_weights path in ./configs/stable-diffusion/normald/sd_1_5/txt_cond/web_datasets/laion_2b_step2.yaml
bash scripts/train_normald_sd/txt_cond/web_datasets/train_normald_webdatasets_step2.sh --gpus 0,1,2,3,4,5,6,7 \
    model.params.first_stage_ckpts=${Normal-Depth-VAE} \
    model.params.ckpt_path=${pretrained-step-weights} \
    data.params.train.params.curls='path_laion/{00000..${:5 end_id}}.tar'

Training MultiView-Normal-Depth-Diffusion Model

After training and get Normal-Depth-Diffusion Model or you could download it from ND,

We provide two versions of MultiView-Normal-Depth Diffusion Model

a. without VAE Denoise b. with VAE Denoise

In current version, we provide w/o VAE denoise

# a. Training Without VAE version
bash ./scripts/train_normald_sd/txt_cond/objaverse/objaverse_finetune_wovae_mvsd-4.sh --gpus 0,1,2,3,4,5,6,7,  \
    model.params.ckpt_path=${Normal-Depth-Diffusion}
# b. Training with VAE version
bash ./scripts/train_normald_sd/txt_cond/objaverse/objaverse_finetune_mvsd-4.sh --gpus 0,1,2,3,4,5,6,7, \
    model.params.ckpt_path=${Normal-Depth-Diffusion}

Training MultiView-Depth-Conditioned-Albedo-Diffusion Model

After training and get Normal-Depth-Diffusion Model or you could download it from ND,

bash scripts/train_abledo/objaverse/objaverse_finetune_mvsd-4.sh --gpus 0,1,2,3,4,5,6,7, model.params.ckpt_path=${Normal-Depth-Diffusion}

Acknowledgement

We have intensively borrow codes from the following repositories. Many thanks to the authors for sharing their codes.

Citation

@article{qiu2023richdreamer,
    title={RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D}, 
    author={Lingteng Qiu and Guanying Chen and Xiaodong Gu and Qi zuo and Mutian Xu and Yushuang Wu and Weihao Yuan and Zilong Dong and Liefeng Bo and Xiaoguang Han},
    year={2023},
    journal = {arXiv preprint arXiv:2311.16918}
}