Awesome

Make-It-3D Jittor Implementation:

We provide Jittor implementations for our paper "Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior".

<div class="half"> <img src="demo/bunny-cake.png" width="128"><img src="demo/bunny-cake-rgb.gif" width="128"><img src="demo/bunny-cake-normal.gif" width="128"><img src="demo/castle.png" width="128"><img src="demo/castle-rgb.gif" width="128"><img src="demo/castle-normal.gif" width="128"> </div> <div class="half"> <img src="demo/house.png" width="128"><img src="demo/house-rgb.gif" width="128"><img src="demo/house-normal.gif" width="128"><img src="demo/jay.png" width="128"><img src="demo/jay-rgb.gif" width="128"><img src="demo/jay-normal.gif" width="128"> </div>

Project page | Paper

Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen.

Abstract

In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.

Todo (Latest update: 2024/07/16)

Release coarse stage training code
Release refine stage training code
Release coarse stage training code with Instant NGP
Release all training code (coarse + refine stage)

Demo of 360° geometry

SAM + Make-It-3D

Installation

1. Requirements

An NVIDIA GPU. All shown results come from an RTX 4090.
A C++14 capable compiler. The following choices are recommended and have been tested:
- Linux: GCC/G++ 8 or higher
Python. python 3.9 is recommended.

2. Download jittor-related libraries

Please download the required folders from here The directory structure of downloaded folder is as following:

makeit3d_requirement/
│
├── jittor-1.3.9.7/
│   ├── setup.py
│   └── ...
|
├── jtorch/
│   ├── setup.py
|   └── ...
|
├── diffuser_jittor/
│   ├── setup.py
│   └── ...
| 
├── jclip/
│   ├── setup.py
|   └── ...
| 
├── JDiffusion/
│   ├── setup.py
│   └── ...
|
├── transformers_jittor/
│   ├── setup.py
│   └── ...
|
├── JNeRF/
│   ├── setup.py
│   └── ...
└── ...

3. Compile the jittor-related libraries.

After obtaining the makeit3d_requirement folder, you need to compile all of them. Please run the following command in the same directory as the setup.py file in each of the libraries mentioned above:

pip install -e .

Note: Due to the dependencies between the components, it is best to compile in the order shown in above diagram.

4. Install other dependencies

Other dependencies:

pip install -r requirements.txt

5. Download the pre-trained model

Stable Diffusion 2.0 You can dowlond the weights for sd2 into the sd2 folder.
clip-b16 You can dowlond the weights for clip into the clip-b16 folder.

Training

Coarse stage

We use progressive training strategy to generate a full 360° 3D geometry. Run the command and modify the workspace name NAME, the path of the reference image IMGPATH and the prompt PROMPT describing the image. We first optimize the scene under frontal camera views.

python main.py --workspace ${NAME} --ref_path "${IMGPATH}" --phi_range 135 225 --iters 2000 --text ${PROMPT}

We have proposed the example in the folder results, you can run the following command for a quick start:

python main.py --workspace teddy --ref_path demo/teddy.png --phi_range 135 225 --iters 2000 --text "a teddy bear"

python main.py --workspace teddy2 --ref_path demo/teddy-2.png --phi_range 135 225 --iters 2000 --text "a teddy bear"

If you want to run Make-It-3D on your own examples, ensure you have obtained the depth map and mask by following the guidance in preprocess before your training.
We also provide a vanilla nerf for makeit3d:

python main.py --workspace ${NAME} --ref_path "${IMGPATH}" --phi_range 135 225  --backbone vanilla --iters 10000 --text ${PROMPT}

Refine stage

After the coarse stage training, now you can easily use the command --refine for refine stage training. We optimize the scene under frontal camera views.

python main.py --workspace ${NAME} --ref_path "${IMGPATH}" --phi_range 135 225 --refine_iters 3000  --refine

We have proposed an example for refine stage. Before the refine stage training, you should download pretrained checkpoint into your workspace.

You can easily refine this teddy bear texture as following guidance:

python main.py --workspace teddy --ref_path "demo/teddy.png" --refine --refine_iter 3000 --iters 2000 --text "a teddy bear"

Important Note

Hallucinating 3D geometry and generating novel views from a single image of general genre is a challenging task. While our method demonstrates strong capability on creating 3D from most images with a centered single object, it may still encounter difficulties in reconstructing solid geometry on complex cases. If you encounter any bugs, please feel free to contact us.

Citation

If you find this code helpful for your research, please cite:

@InProceedings{Tang_2023_ICCV,
    author    = {Tang, Junshu and Wang, Tengfei and Zhang, Bo and Zhang, Ting and Yi, Ran and Ma, Lizhuang and Chen, Dong},
    title     = {Make-It-3D: High-fidelity 3D Creation from A Single Image with Diffusion Prior},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {22819-22829}
}

Acknowledgments

This code borrows heavily from Stable-Dreamfusion.