


Implementation of CVPR 2024 paper "DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior"


<p align="center"> <img src="./assets/overview.png" width="1080px"/> <br> <em>DreamControl generates a coarse 3D self-prior and refines texture with control-based distillation, aiming to create 3D content with both high-consistent geometries and high-fidelity textures.</em> </p>


We build our source code based on original threestudio. See installation.md for additional information.

Qucik Start

We provide two ways to load shape conditions for our control-based score distillation.

# load an obj file as a condition:
python launch.py --config ./configs/dreamcontrol.yaml --train --gpu 0 system.prompt_processor.prompt="{your text prompt}" system.geometry_c.shape_init="mesh:{path of the obj file}"
# load a threestudio checkpoint as a condition:
python launch.py --config ./configs/dreamcontrol.yaml --train --gpu 0 system.prompt_processor.prompt="{your text prompt}" system.geometry_c.shape_init="{path of the last checkpoint file}"

For example, you can use the template geometry we provide in ./load/rot_shapes or pre-process a new shape with correct orientation (Note that, the orientation of original geoemtries in threestudio's ./load/shapes is different from ours):

# rotate the geometry
cd ./load/rot_shapes
python rot.py ../shapes/nascar.obj

# optimize a NeRF with DreamControl
cd ../..
python launch.py --config ./configs/dreamcontrol.yaml --train --gpu 0 system.prompt_processor.prompt="A Tesla Model3 sedan" system.geometry_c.shape_init="mesh:./load/rot_shapes/nascar.obj"

Or, you can use a threestudio checkpoint. We provide an example here. You can download and unzip it, and then:

python launch.py --config ./configs/dreamcontrol.yaml --train --gpu 0 system.prompt_processor.prompt="a corgi" system.geometry_c.shape_init="./a_corgi/ckpts/last.ckpt"



<table class="center"> <tr> <td width=20% align="center"><img src="./assets/videos/1.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/2.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/3.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/4.gif" raw=true></td> </tr> <tr> <td width=20% align="center">"Lionel Messi in a suit, holding the Ballon d'Or"</td> <td width=20% align="center">"Elon musk, using a laptop"</td> <td width=20% align="center">"Batman is riding a moto"</td> <td width=20% align="center">"An astronaut is riding a horse"</td> </tr> <tr> <td width=20% align="center"><img src="./assets/videos/5.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/6.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/7.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/8.gif" raw=true></td> </tr> <tr> <td width=20% align="center">"A chimpanzee dressed like Henry VIII king of England"</td> <td width=20% align="center">"Michelangelo style statue of dog reading news on a cellphone"</td> <td width=20% align="center">"A blue jay standing on a large basket of rainbow macarons"</td> <td width=20% align="center">"A baby bunny sitting on top of a stack of pancakes"</td> </tr> <tr> <td width=20% align="center"><img src="./assets/videos/9.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/10.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/11.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/12.gif" raw=true></td> </tr> <tr> <td width=20% align="center">"A model of a house in Tudor style"</td> <td width=20% align="center">"Tower Bridge made out of gingerbread and candy"</td> <td width=20% align="center">"A highly detailed sand castle"</td> <td width=20% align="center">"A pavilion in a Chinese garden"</td> </tr> <tr> <td width=20% align="center"><img src="./assets/videos/13.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/14.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/15.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/16.gif" raw=true></td> </tr> <tr> <td width=20% align="center">"A gundam"</td> <td width=20% align="center">"Woody from Toy Story"</td> <td width=20% align="center">"Captain America"</td> <td width=20% align="center">"Spider Man"</td> </tr> <tr> <td width=20% align="center"><img src="./assets/videos/17.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/18.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/19.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/20.gif" raw=true></td> </tr> <tr> <td width=20% align="center">"A wizard"</td> <td width=20% align="center">"A corgi"</td> <td width=20% align="center">"A Tesla Model3 sedan"</td> <td width=20% align="center">"A classic Packard car"</td> </tr> <tr> <td width=20% align="center"><img src="./assets/videos/21.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/22.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/23.gif" raw=true></td> <td width=20% align="center"><img src="./assets/videos/24.gif" raw=true></td> </tr> <tr> <td width=20% align="center">"A teddy bear"</td> <td width=20% align="center">"A plate piled high with chocolate chip cookies"</td> <td width=20% align="center">"A plate of fried chicken and waffles with maple syrup on them"</td> <td width=20% align="center">"A vase with pink flowers"</td> </tr> </table>


Our codes are built on threestudio, and ControlLoRA.


  title={Dreamcontrol: Control-based text-to-3d generation with 3d self-prior},
  author={Huang, Tianyu and Zeng, Yihan and Zhang, Zhilu and Xu, Wan and Xu, Hang and Xu, Songcen and Lau, Rynson WH and Zuo, Wangmeng},
  journal={arXiv preprint arXiv:2312.06439},