Home

Awesome

<p align="center"> <picture> <img alt="threestudio" src="https://user-images.githubusercontent.com/19284678/236847132-219999d0-4ffa-4240-a262-c2c025d15d9e.png" width="50%"> </picture> </p> <p align="center"><b> threestudio is a unified framework for 3D content creation from text prompts, single images, and few-shot images, by lifting 2D text-to-image generation models. </b></p> <p align="center"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/f48eca9f-45a7-4092-a519-6bb99f4939e4.gif" width="100%"> <br/> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/01a00207-3240-4a8e-aa6f-d48436370fe7.png" width="100%"> <br/> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/1dbdebab-43d5-4830-872c-66b38d9fda92" width="48%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/d746b874-d82f-4977-a549-98d9ba764dfc" width="25%"> <img alt="threestudio" src="https://github.com/user-attachments/assets/afcf74ee-85ff-4792-b109-191f54b44edd" width="24%"> <br/> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/437b4044-142c-4e5d-a406-4d9bad0205e1" width="48%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/812741c0-7229-412e-b6ab-81e377890f04" width="25%"> <img alt="threestudio" src="https://github.com/user-attachments/assets/c0858bc5-6b9d-446a-b5df-76534c8a3072" width="25%"> <br/> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/4f4d62c5-2304-4e20-b632-afe6d144a203" width="68%"> <br/> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/2f36ddbd-e3cf-4431-b269-47a9cb3d6e6e" width="68%"> <br/> </p> <p align="center"><b> ๐Ÿ‘† Results obtained from methods implemented by threestudio ๐Ÿ‘† <br/> | <a href="https://ml.cs.tsinghua.edu.cn/prolificdreamer/">ProlificDreamer</a> | <a href="https://dreamfusion3d.github.io/">DreamFusion</a> | <a href="https://research.nvidia.com/labs/dir/magic3d/">Magic3D</a> | <a href="https://pals.ttic.edu/p/score-jacobian-chaining">SJC</a> | <a href="https://github.com/eladrich/latent-nerf">Latent-NeRF</a> | <a href="https://fantasia3d.github.io/">Fantasia3D</a> | <a href="https://fabi92.github.io/textmesh/">TextMesh</a> | <br/> | <a href="https://zero123.cs.columbia.edu/">Zero-1-to-3</a> | <a href="https://guochengqian.github.io/project/magic123/">Magic123</a> | <a href="https://github.com/JunzheJosephZhu/HiFA">HiFA</a> | <a href="https://lukoianov.com/sdi">SDI</a> | <br /> | <a href="https://instruct-nerf2nerf.github.io/">InstructNeRF2NeRF</a> | <a href="https://control4darxiv.github.io/">Control4D</a> | </b> <p align="center"> <a href="https://colab.research.google.com/github/threestudio-project/threestudio/blob/main/threestudio.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg"> </a> <a href="https://huggingface.co/spaces/bennyguo/threestudio"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Gradio%20Demo-Huggingface-orange"></a> <a href="http://t23-g-01.threestudio.ai"><img src="https://img.shields.io/badge/Gradio%20Demo-Tencent-blue?logo=tencentqq&logoColor=white"></a> <a href="https://discord.gg/ejer2MAB8N"><img src="https://img.shields.io/badge/Discord-5865F2?logo=discord&logoColor=white"></a> </p> <p align="center"> Did not find what you want? Checkout <a href="https://threestudio-project.github.io/threestudio-extensions/"><b>threestudio-extension</b></a> or submit a feature request <a href="https://github.com/threestudio-project/threestudio/discussions/46">here</a>! </p> <p align="center"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/ac6089a7-d88f-414c-96d6-a5e75616115a" width="68%"> </p> <p align="center"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/302a399e-d36f-453e-a595-1c7d120451d3" width="35%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/025e6980-baf2-4b5f-9c23-4f66ef847bf5" width="35%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/cfcd828f-daed-4d2e-abf1-29f69eb2ffbb" width="18%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/f04b6bdd-ef02-4ce7-b7c9-981f8bda419f" width="35%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/13ae104e-e020-4de9-a677-87f29067a1c0" width="35%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/c6337097-b5bd-4fe8-a03a-a68fb9260009" width="18%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/e41532fd-8f00-45b4-a473-26a9f1bca4f8" width="35%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/7b1b919d-d97a-4f50-afa3-6c1b7ecfe7b6" width="35%"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/24589363/8892898f-8bd8-43dc-a4ec-dd8d078af860" width="45%"> </p> <p align="center"><b> | <a href="https://github.com/HeliosZhao/Animate124/tree/threestudio">Animate-124</a> | <a href="https://github.com/DSaurus/threestudio-4dfy">4D-fy</a> | <a href="https://github.com/baaivision/GeoDream/tree/threestudio">GeoDream</a> | <a href="https://github.com/DSaurus/threestudio-dreamcraft3D">DreamCraft3D</a> | <a href="https://github.com/huanngzh/threestudio-dreamwaltz">Dreamwaltz</a> | <a href="https://github.com/KU-CVLAB/3DFuse-threestudio">3DFuse</a> | <a href="https://github.com/cxh0519/Progressive3D">Progressive3D</a> | <a href="https://github.com/cxh0519/threestudio-gaussiandreamer">GaussianDreamer</a> | <a href="https://github.com/DSaurus/threestudio-3dgs">Gaussian Splatting</a> | <a href="https://github.com/DSaurus/threestudio-mvdream">MVDream</a> | <a href="https://github.com/DSaurus/threestudio-meshfitting">Mesh-Fitting</a> | </b>

News

export-blender

Installation

See installation.md for additional information, including installation via Docker.

The following steps have been tested on Ubuntu20.04.

python3 -m virtualenv venv
. venv/bin/activate

# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
python3 -m pip install --upgrade pip
# torch1.12.1+cu113
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# or torch2.0.0+cu118
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install ninja
pip install -r requirements.txt

Quickstart

Here we show some basic usage of threestudio. First let's train a DreamFusion model to create a classic pancake bunny.

If you are experiencing unstable connections with Hugging Face, we suggest you either (1) setting environment variable TRANSFORMERS_OFFLINE=1 DIFFUSERS_OFFLINE=1 HF_HUB_OFFLINE=1 before your running command after all needed files have been fetched on the first run, to prevent from connecting to Hugging Face each time you run, or (2) downloading the guidance model you used to a local folder following here and here, and set pretrained_model_name_or_path of the guidance and the prompt processor to the local path.

# if you have agreed the license of DeepFloyd IF and have >20GB VRAM
# please try this configuration for higher quality
python launch.py --config configs/dreamfusion-if.yaml --train --gpu 0 system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes"
# otherwise you could try with the Stable Diffusion model, which fits in 6GB VRAM
python launch.py --config configs/dreamfusion-sd.yaml --train --gpu 0 system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes"

threestudio uses OmegaConf for flexible configurations. You can easily change any configuration in the YAML file by specifying arguments without --, for example the specified prompt in the above cases. For all supported configurations, please see our documentation.

The training lasts for 10,000 iterations. You can find visualizations of the current status in the trial directory which defaults to [exp_root_dir]/[name]/[tag]@[timestamp], where exp_root_dir (outputs/ by default), name and tag can be set in the configuration file. A 360-degree video will be generated after the training is completed. In training, press ctrl+c one time will stop training and head directly to the test stage which generates the video. Press ctrl+c the second time to fully quit the program.

Multi-GPU training

Multi-GPU training is supported, but may still be buggy. Note that data.batch_size is the batch size per rank (device). Also remember to

# this results in an effective batch size of 4 (number of GPUs) * 2 (data.batch_size) = 8
python launch.py --config configs/dreamfusion-if.yaml --train --gpu 0,1,2,3 system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes" data.batch_size=2 data.n_val_views=4

If you define the CUDA_VISIBLE_DEVICES environment variable before you call launch.py, you don't need to specify --gpu - this will use all available GPUs from CUDA_VISIBLE_DEVICES. For instance, the following command will automatically use GPUs 3 and 4:

CUDA_VISIBLE_DEVICES=3,4 python launch.py --config configs/dreamfusion-if.yaml --train system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes"

This is particularly useful if you run launch.py in a cluster using a command that automatically picks GPU(s) and exports their IDs through CUDA_VISIBLE_DEVICES, e.g. through SLURM:

cd git/threestudio
. venv/bin/activate
srun --account mod3d --partition=g40 --gpus=1 --job-name=3s_bunny python launch.py --config configs/dreamfusion-if.yaml --train system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes"

Resume from checkpoints

If you want to resume from a checkpoint, do:

# resume training from the last checkpoint, you may replace last.ckpt with any other checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
# if the training has completed, you can still continue training for a longer time by setting trainer.max_steps
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt trainer.max_steps=20000
# you can also perform testing using resumed checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --test --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
# note that the above commands use parsed configuration files from previous trials
# which will continue using the same trial directory
# if you want to save to a new trial directory, replace parsed.yaml with raw.yaml in the command

# only load weights from saved checkpoint but dont resume training (i.e. dont load optimizer state):
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 system.weights=path/to/trial/dir/ckpts/last.ckpt

Export Meshes

To export the scene to texture meshes, use the --export option. We currently support exporting to obj+mtl, or obj with vertex colors.

# this uses default mesh-exporter configurations which exports obj+mtl
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter
# specify system.exporter.fmt=obj to get obj with vertex colors
# you may also add system.exporter.save_uv=false to accelerate the process, suitable for a quick peek of the result
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.exporter.fmt=obj
# for NeRF-based methods (DreamFusion, Magic3D coarse, Latent-NeRF, SJC)
# you may need to adjust the isosurface threshold (25 by default) to get satisfying outputs
# decrease the threshold if the extracted model is incomplete, increase if it is extruded
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_threshold=10.
# use marching cubes of higher resolutions to get more detailed models
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_method=mc-cpu system.geometry.isosurface_resolution=256

For all the options you can specify when exporting, see the documentation.

See here for example running commands of all our supported models. Please refer to here for tips on getting higher-quality results, and here for reducing VRAM usage.

Gradio Web Interface

Launch the Gradio web interface by

python gradio_app.py launch

Parameters:

For feature requests, bug reports, or discussions about technical problems, please file an issue. In case you want to discuss the generation quality or showcase your generation results, please feel free to participate in the discussion panel.

Supported Models

Score Distillation via Reparametrized DDIM (SDI) arXiv

SDI suggests to reconsider the approach to sampling the noise term in Dreamfusion. The paper demonstrates that score distillation process can be seen as a reparametrization of 2D image sampling algorithms. In that case the noise added on each step of score distillation should be of a very particular form. Noise in Dreamfusion (SDS), however, is sampled randomly, what causes over-blurring. SDI approximates the correct noise term by inverting the DDIM process.

Notable differences from the paper: N/A.

Pros:

Cons:

Results obtained in threestudio (Stable Diffusion, 512x512)

<img alt="A_DSLR_photo_of_a_freshly_baked_round_loaf_of_sourdough_bread" src="https://github.com/user-attachments/assets/ec499869-502a-4bcc-b983-279643920b89" width="48%"> <img alt="a_photograph_of_a_knight" src="https://github.com/user-attachments/assets/71981e65-b8b5-4505-beab-41ef1cd545a9" width="48%">

Example running commands

python launch.py --config configs/sdi.yaml --train --gpu 0 system.prompt_processor.prompt="pumpkin head zombie, skinny, highly detailed, photorealistic"

python launch.py --config configs/sdi.yaml --train --gpu 1 system.prompt_processor.prompt="a photograph of a ninja"

python launch.py --config configs/sdi.yaml --train --gpu 2 system.prompt_processor.prompt="a zoomed out DSLR photo of a hamburger"

python launch.py --config configs/sdi.yaml --train --gpu 3 system.prompt_processor.prompt="bagel filled with cream cheese and lox"

ProlificDreamer arXiv

This is an unofficial experimental implementation! Please refer to https://github.com/thu-ml/prolificdreamer for official code release.

Results obtained by threestudio (Stable Diffusion, 256x256 Stage1)

https://github.com/threestudio-project/threestudio/assets/19284678/27b42d8f-4aa4-4b47-8ea0-0f77db90fd1e

https://github.com/threestudio-project/threestudio/assets/19284678/ffcbbb01-3817-4663-a2bf-5e21a076bc3d

Results obtained by threestudio (Stable Diffusion, 256x256 Stage1, 512x512 Stage2+3)

https://github.com/threestudio-project/threestudio/assets/19284678/cfab881e-18dc-45fc-8384-7476f835b36e

Notable differences from the paper:

# --------- Stage 1 (NeRF) --------- #
# object generation with 512x512 NeRF rendering, ~30GB VRAM
python launch.py --config configs/prolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt="a pineapple"
# if you don't have enough VRAM, try training with 64x64 NeRF rendering, ~15GB VRAM
python launch.py --config configs/prolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt="a pineapple" data.width=64 data.height=64 data.batch_size=1
# using the same model for pretrained and LoRA enables 64x64 training with <10GB VRAM
# but the quality is worse due to the use of an epsilon prediction model for LoRA training
python launch.py --config configs/prolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt="a pineapple" data.width=64 data.height=64 data.batch_size=1 system.guidance.pretrained_model_name_or_path_lora="stabilityai/stable-diffusion-2-1-base"
# Using patch-based renderer to reduce memory consume, 512x512 resolution, ~20GB VRAM
python launch.py --config configs/prolificdreamer-patch.yaml --train --gpu 0 system.prompt_processor.prompt="a pineapple"
# scene generation with 512x512 NeRF rendering, ~30GB VRAM
python launch.py --config configs/prolificdreamer-scene.yaml --train --gpu 0 system.prompt_processor.prompt="Inside of a smart home, realistic detailed photo, 4k"

# --------- Stage 2 (Geometry Refinement) --------- #
# refine geometry with 512x512 rasterization, Stable Diffusion SDS guidance
python launch.py --config configs/prolificdreamer-geometry.yaml --train --gpu 0 system.prompt_processor.prompt="a pineapple" system.geometry_convert_from=path/to/stage1/trial/dir/ckpts/last.ckpt

# --------- Stage 3 (Texturing) --------- #
# texturing with 512x512 rasterization, Stable Difusion VSD guidance
python launch.py --config configs/prolificdreamer-texture.yaml --train --gpu 0 system.prompt_processor.prompt="a pineapple" system.geometry_convert_from=path/to/stage2/trial/dir/ckpts/last.ckpt

HiFA arXiv

This is a re-implementation, missing some improvements from the original paper(coarse-to-fine NeRF sampling, kernel smoothing). For original results, please refer to https://github.com/JunzheJosephZhu/HiFA

HiFA is more like a suite of improvements including image space SDS, z-variance loss, and noise strength annealing. It is compatible with most optimization-based methods. Therefore, we provide three variants based on DreamFusion, ProlificDreamer, and Magic123. We provide a unified guidance config as well as an SDS/VSD guidance config for the DreamFusion and ProlificDreamer variants, both configs should achieve the same results. Additionally, we also make HiFA compatible with ProlificDreamer-scene.

Results obtained by threestudio(Dreamfusion-HiFA, 512x512)

https://github.com/threestudio-project/threestudio/assets/24391451/c0030c66-0691-4ec2-8b79-d933101864a0

Results obtained by threestudio(ProlificDreamer-HiFA, 512x512)

https://github.com/threestudio-project/threestudio/assets/24391451/ff5dc4d0-d7d7-4a73-964e-84b8c48e2907

Results obtained by threestudio(Magic123-HiFA, 512x512)

https://github.com/threestudio-project/threestudio/assets/24391451/eb6f2f74-9143-4e26-8429-e300ad2d2b80

Example running commands

# ------ DreamFusion-HiFA ------- # (similar to original paper)
python launch.py --config configs/hifa.yaml --train --gpu 0 system.prompt_processor.prompt="a plate of delicious tacos"
python launch.py --config configs/experimental/unified-guidance/hifa.yaml --train --gpu 0 system.prompt_processor.prompt="a plate of delicious tacos"
# ------ ProlificDreamer-HiFA ------- #
python launch.py --config configs/prolificdreamer-hifa.yaml --train --gpu 0 system.prompt_processor.prompt="a plate of delicious tacos"
python launch.py --config configs/experimental/unified-guidance/prolificdreamer-hifa.yaml --train --gpu 0 system.prompt_processor.prompt="a plate of delicious tacos"
# ------ ProlificDreamer-scene-HiFA ------- #
python launch.py --config configs/prolificdreamer-scene-hifa.yaml --train --gpu 0 system.prompt_processor.prompt="A DSLR photo of a hamburger inside a restaurant"
# ------ Magic123-HiFA ------ #
python launch.py --config configs/magic123-hifa-coarse-sd.yaml --train --gpu 0 data.image_path=load/images/firekeeper_rgba.png system.prompt_processor.prompt="a toy figure of firekeeper from dark souls"
# We included a config for magic123's refine stage, but didn't really run it, since the coarse stage result already looks pretty decent.

Tips

DreamFusion arXiv

Results obtained by threestudio (DeepFloyd IF, batch size 8)

https://user-images.githubusercontent.com/19284678/236694848-38ae4ea4-554b-4c9d-b4c7-fba5bee3acb3.mp4

Notable differences from the paper

Example running commands

# uses DeepFloyd IF, requires ~15GB VRAM to extract text embeddings and ~10GB VRAM in training
# here we adopt random background augmentation to improve geometry quality
python launch.py --config configs/dreamfusion-if.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger" system.background.random_aug=true
# uses StableDiffusion, requires ~6GB VRAM in training
python launch.py --config configs/dreamfusion-sd.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger"

Tips

Magic3D arXiv

Results obtained by threestudio (DeepFloyd IF, batch size 8; first row: coarse, second row: refine)

https://user-images.githubusercontent.com/19284678/236694858-0ed6939e-cd7a-408f-a94b-406709ae90c0.mp4

Notable differences from the paper

Example running commands

First train the coarse stage NeRF:

# uses DeepFloyd IF, requires ~15GB VRAM to extract text embeddings and ~10GB VRAM in training
python launch.py --config configs/magic3d-coarse-if.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger"
# uses StableDiffusion, requires ~6GB VRAM in training
python launch.py --config configs/magic3d-coarse-sd.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger"

Then convert the NeRF from the coarse stage to DMTet and train with differentiable rasterization:

# the refinement stage uses StableDiffusion, and requires ~5GB VRAM in training
python launch.py --config configs/magic3d-refine-sd.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger" system.geometry_convert_from=path/to/coarse/stage/trial/dir/ckpts/last.ckpt
# if you're unsatisfied with the surface extracted using the default threshold (25)
# you can specify a threshold value using `system.geometry_convert_override`
# decrease the value if the extracted surface is incomplete, increase if it is extruded
python launch.py --config configs/magic3d-refine-sd.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger" system.geometry_convert_from=path/to/coarse/stage/trial/dir/ckpts/last.ckpt system.geometry_convert_override.isosurface_threshold=10.

Tips

Score Jacobian Chaining arXiv

Results obtained by threestudio (Stable Diffusion)

https://user-images.githubusercontent.com/19284678/236694871-87a247c1-2d3d-4cbf-89df-450bfeac3aca.mp4

Notable differences from the paper: N/A.

Example running commands

# train with sjc guidance in latent space
python launch.py --config configs/sjc.yaml --train --gpu 0 system.prompt_processor.prompt="A high quality photo of a delicious burger"
# train with sjc guidance in latent space, trump figure
python launch.py --config configs/sjc.yaml --train --gpu 0 system.prompt_processor.prompt="Trump figure" trainer.max_steps=30000 system.loss.lambda_emptiness="[15000,10000.0,200000.0,15001]" system.optimizer.params.background.lr=0.05 seed=42

Tips

Latent-NeRF arXiv

Results obtained by threestudio (Stable Diffusion)

https://user-images.githubusercontent.com/19284678/236694876-5a270347-6a41-4429-8909-44c90c554e06.mp4

Notable differences from the paper: N/A.

We currently only implement Latent-NeRF for text-guided and Sketch-Shape for (text,shape)-guided 3D generation. Latent-Paint is not implemented yet.

Example running commands

# train Latent-NeRF in Stable Diffusion latent space
python launch.py --config configs/latentnerf.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger"
# refine Latent-NeRF in RGB space
python launch.py --config configs/latentnerf-refine.yaml --train --gpu 0 system.prompt_processor.prompt="a delicious hamburger" system.weights=path/to/latent/stage/trial/dir/ckpts/last.ckpt

# train Sketch-Shape in Stable Diffusion latent space
python launch.py --config configs/sketchshape.yaml --train --gpu 0 system.guide_shape=load/shapes/teddy.obj system.prompt_processor.prompt="a teddy bear in a tuxedo"
# refine Sketch-Shape in RGB space
python launch.py --config configs/sketchshape-refine.yaml --train --gpu 0 system.guide_shape=load/shapes/teddy.obj system.prompt_processor.prompt="a teddy bear in a tuxedo" system.weights=path/to/latent/stage/trial/dir/ckpts/last.ckpt

Fantasia3D arXiv

Results obtained by threestudio (Stable Diffusion)

https://user-images.githubusercontent.com/19284678/236694880-33b0db21-4530-47f1-9c3b-c70357bc84b3.mp4

Results obtained by threestudio (Stable Diffusion, mesh initialization)

https://github.com/threestudio-project/threestudio/assets/19284678/762903c1-665b-47b5-a2c2-bd7021a9e548.mp4

<p align="center"> <img alt="threestudio" src="https://github.com/threestudio-project/threestudio/assets/19284678/2d22e30f-4a32-454a-a06e-d6e6bd2a1b96.png" width="100%"> </p>

Notable differences from the paper:

Example running commands

# --------- Geometry --------- #
python launch.py --config configs/fantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt="a DSLR photo of an ice cream sundae"
# Fantasia3D highly relies on the initialized SDF shape
# the default shape is a sphere with radius 0.5
# change the shape initialization to match your input prompt
python launch.py --config configs/fantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt="The leaning tower of Pisa" system.geometry.shape_init=ellipsoid system.geometry.shape_init_params="[0.3,0.3,0.8]"
# or you can initialize from a mesh
# here shape_init_params is the scale of the shape
# also make sure to input the correct up and front axis (in +x, +y, +z, -x, -y, -z)
python launch.py --config configs/fantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt="hulk" system.geometry.shape_init=mesh:load/shapes/human.obj system.geometry.shape_init_params=0.9 system.geometry.shape_init_mesh_up=+y system.geometry.shape_init_mesh_front=+z
# --------- Texture --------- #
# to train PBR texture continued from a geometry checkpoint:
python launch.py --config configs/fantasia3d-texture.yaml --train --gpu 0 system.prompt_processor.prompt="a DSLR photo of an ice cream sundae" system.geometry_convert_from=path/to/geometry/stage/trial/dir/ckpts/last.ckpt

Tips

TextMesh arXiv

Results obtained by threestudio (DeepFloyd IF, batch size 4)

https://github.com/threestudio-project/threestudio/assets/19284678/72217cdd-765a-475b-92d0-4ab62bf0f57a

Notable differences from the paper

Example running commands

# uses DeepFloyd IF, requires ~15GB VRAM
python launch.py --config configs/textmesh-if.yaml --train --gpu 0 system.prompt_processor.prompt="lib:cowboy_boots"

Tips

Control4D arXiv

This is an experimental implementation of Control4D using threestudio! Control4D will release the full code including static and dynamic editing after paper acceptance.

Results obtained by threestudio (512x512)

https://github.com/threestudio-project/threestudio/assets/24589363/97d9aadd-32c7-488f-9543-6951b285d588

We currently don't support dynamic editing.

Download the data sample of control4D using this link.

Example running commands

# --------- Control4D --------- #
# static editing with 128x128 NeRF + 512x512 GAN rendering, ~20GB VRAM
python launch.py --config configs/control4d-static.yaml --train --gpu 0 data.dataroot="YOUR_DATAROOT/twindom" system.prompt_processor.prompt="Elon Musk wearing red shirt, RAW photo, (high detailed skin:1.2), 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"

InstructNeRF2NeRF arXiv

Results obtained by threestudio

https://github.com/threestudio-project/threestudio/assets/24589363/7aa43a2d-87d7-4ef5-94b6-f778ddb041b5

Download the data sample of InstructNeRF2NeRF using this link.

Example running commands

# --------- InstructNeRF2NeRF --------- #
# 3D editing with NeRF patch-based rendering, ~20GB VRAM
python launch.py --config configs/instructnerf2nerf.yaml --train --gpu 0 data.dataroot="YOUR_DATAROOT/face" data.camera_layout="front" data.camera_distance=1 data.eval_interpolation=[1,3,50] system.prompt_processor.prompt="Turn him into Albert Einstein"

Magic123 arXiv

Results obtained by threestudio (Zero123 + Stable Diffusion)

https://github.com/threestudio-project/threestudio/assets/19284678/335a58a8-8fee-485b-ac27-c55a16f4a673

Notable differences from the paper

Example running commands

First train the coarse stage NeRF:

# Zero123 + Stable Diffusion, ~12GB VRAM
# data.image_path must point to a 4-channel RGBA image
# system.prompt_proessor.prompt must be specified
python launch.py --config configs/magic123-coarse-sd.yaml --train --gpu 0 data.image_path=load/images/hamburger_rgba.png system.prompt_processor.prompt="a delicious hamburger"

Then convert the NeRF from the coarse stage to DMTet and train with differentiable rasterization:

# Zero123 + Stable Diffusion, ~10GB VRAM
# data.image_path must point to a 4-channel RGBA image
# system.prompt_proessor.prompt must be specified
python launch.py --config configs/magic123-refine-sd.yaml --train --gpu 0 data.image_path=load/images/hamburger_rgba.png system.prompt_processor.prompt="a delicious hamburger" system.geometry_convert_from=path/to/coarse/stage/trial/dir/ckpts/last.ckpt
# if you're unsatisfied with the surface extracted using the default threshold (25)
# you can specify a threshold value using `system.geometry_convert_override`
# decrease the value if the extracted surface is incomplete, increase if it is extruded
python launch.py --config configs/magic123-refine-sd.yaml --train --gpu 0 data.image_path=load/images/hamburger_rgba.png system.prompt_processor.prompt="a delicious hamburger" system.geometry_convert_from=path/to/coarse/stage/trial/dir/ckpts/last.ckpt system.geometry_convert_override.isosurface_threshold=10.

Tips

Stable Zero123

Installation

Download pretrained Stable Zero123 checkpoint stable-zero123.ckpt into load/zero123 from https://huggingface.co/stabilityai/stable-zero123

Results obtained by threestudio (Stable Zero123 vs Zero123-XL) Final_video_v01

Direct multi-view images generation If you only want to generate multi-view images, please refer to threestudio-mvimg-gen. This extension can use Stable Zero123 to directly generate images from multi-view perspectives.

Example running commands

  1. Take an image of your choice, or generate it from text using your favourite AI image generator such as SDXL Turbo (https://clipdrop.co/stable-diffusion-turbo) E.g. "A simple 3D render of a friendly dog"
  2. Remove its background using Clipdrop (https://clipdrop.co/remove-background)
  3. Save to load/images/, preferably with _rgba.png as the suffix
  4. Run Zero-1-to-3 with the Stable Zero123 ckpt:
python launch.py --config configs/stable-zero123.yaml --train --gpu 0 data.image_path=./load/images/hamburger_rgba.png

IMPORTANT NOTE: This is an experimental implementation and we're constantly improving the quality.

IMPORTANT NOTE: This implementation extends the Zero-1-to-3 implementation below, and is heavily inspired from the Zero-1-to-3 implementation in https://github.com/ashawkey/stable-dreamfusion! extern/ldm_zero123 is borrowed from stable-dreamfusion/ldm.

Zero-1-to-3 arXiv

Installation

Download pretrained Zero123XL weights into load/zero123:

cd load/zero123
wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt

Results obtained by threestudio (Zero-1-to-3)

https://github.com/threestudio-project/threestudio/assets/22424247/f4e7b66f-7a46-4f9f-8fcd-750300cef651

IMPORTANT NOTE: This is an experimental implementation and we're constantly improving the quality.

IMPORTANT NOTE: This implementation is heavily inspired from the Zero-1-to-3 implementation in https://github.com/ashawkey/stable-dreamfusion! extern/ldm_zero123 is borrowed from stable-dreamfusion/ldm.

Example running commands

  1. Take an image of your choice, or generate it from text using your favourite AI image generator such as Stable Diffusion XL (https://clipdrop.co/stable-diffusion) E.g. "A simple 3D render of a friendly dog"
  2. Remove its background using Clipdrop (https://clipdrop.co/remove-background)
  3. Save to load/images/, preferably with _rgba.png as the suffix
  4. Run Zero-1-to-3:
python launch.py --config configs/zero123.yaml --train --gpu 0 data.image_path=./load/images/dog1_rgba.png

For more scripts for Zero-1-to-3, please check threestudio/scripts/run_zero123.sh.

Previous Zero-1-to-3 weights are available at https://huggingface.co/cvlab/zero123-weights/. You can download them to load/zero123 as above, and replace the path at system.guidance.pretrained_model_name_or_path.

Guidance evaluation

Also includes evaluation of the guidance during training. If system.freq.guidance_eval is set to a value > 0, this will save rendered image, noisy image (noise added mentioned at top left), 1-step-denoised image, 1-step prediction of original image, fully denoised image. For example:

it143-train

More to come, please stay tuned.

If you would like to contribute a new method to threestudio, see here.

Prompt Library

For easier comparison, we collect the 397 preset prompts from the website of DreamFusion in this file. You can use these prompts by setting system.prompt_processor.prompt=lib:keyword1_keyword2_..._keywordN. Note that the prompt should starts with lib: and all the keywords are separated by _. The prompt processor will match the keywords to all the prompts in the library, and will only succeed if there's exactly one match. The used prompt will be printed to the console. Also note that you can't use this syntax to point to every prompt in the library, as there are prompts that are subset of other prompts lmao. We will enhance the use of this feature.

Tips on Improving Quality

It's important to note that existing techniques that lift 2D T2I models to 3D cannot consistently produce satisfying results. Results from great papers like DreamFusion and Magic3D are (to some extent) cherry-pickled, so don't be frustrated if you do not get what you expected on your first trial. Here are some tips that may help you improve the generation quality:

VRAM Optimization

If you encounter CUDA OOM error, try the following in order (roughly sorted by recommendation) to meet your VRAM requirement.

Documentation

threestudio use OmegaConf to manage configurations. You can literally change anything inside the yaml configuration file or by adding command line arguments without --. We list all arguments that you can change in the configuration in our documentation. Happy experimenting!

wandb (Weights & Biases) logging

To enable the (experimental) wandb support, set system.loggers.wandb.enable=true, e.g.:

python launch.py --config configs/zero123.yaml --train --gpu 0 system.loggers.wandb.enable=true`

If you're using a corporate wandb server, you may first need to login to your wandb instance, e.g.: wandb login --host=https://COMPANY_XYZ.wandb.io --relogin

By default the runs will have a random name, recorded in the threestudio project. You can override them to give a more descriptive name, e.g.:

python launch.py --config configs/zero123.yaml --train --gpu 0 system.loggers.wandb.enable=true system.loggers.wandb.name="zero123xl_accum;bs=4;lr=0.05"

Contributing to threestudio

pip install -r requirements-dev.txt

Code Structure

Here we just briefly introduce the code structure of this project. We will make more detailed documentation about this in the future.

Known Problems

Credits

threestudio is built on the following amazing open-source projects:

The following repositories greatly inspire threestudio:

Thanks to the maintainers of these projects for their contribution to the community!

Citing threestudio

If you find threestudio helpful, please consider citing:

@Misc{threestudio2023,
  author =       {Yuan-Chen Guo and Ying-Tian Liu and Ruizhi Shao and Christian Laforte and Vikram Voleti and Guan Luo and Chia-Hao Chen and Zi-Xin Zou and Chen Wang and Yan-Pei Cao and Song-Hai Zhang},
  title =        {threestudio: A unified framework for 3D content generation},
  howpublished = {\url{https://github.com/threestudio-project/threestudio}},
  year =         {2023}
}