Home

Awesome

taichi_3d_gaussian_splatting

An unofficial implementation of paper 3D Gaussian Splatting for Real-Time Radiance Field Rendering by taichi lang.

What does 3D Gaussian Splatting do?

Training:

The algorithm takes image from multiple views, a sparse point cloud, and camera pose as input, use a differentiable rasterizer to train the point cloud, and output a dense point cloud with extra features(covariance, color information, etc.).

<img src="images/image_from_multi_views.png" alt="drawing" width="200"/>
If we view the training process as module, it can be described as:

graph LR
    A[ImageFromMultiViews] --> B((Training))
    C[sparsePointCloud] --> B
    D[CameraPose] --> B
    B --> E[DensePointCloudWithExtraFeatures]

Inference:

The algorithm takes the dense point cloud with extra features and any camera pose as input, use the same rasterizer to render the image from the camera pose.

graph LR
    C[DensePointCloudWithExtraFeatures] --> B((Inference))
    D[NewCameraPose] --> B
    B --> E[Image]

An example of inference result:

https://github.com/wanmeihuali/taichi_3d_gaussian_splatting/assets/18469933/cc760693-636b-4157-ae85-33813f3da54d

Because the nice property of point cloud, the algorithm easily handles scene/object merging compared to other NeRF-like algorithms.

https://github.com/wanmeihuali/taichi_3d_gaussian_splatting/assets/18469933/bc38a103-e435-4d35-9239-940e605b4552

<details><summary>other example result</summary> <p>

top left: result from this repo(30k iteration), top right: ground truth, bottom left: normalized depth, bottom right: normalized num of points per pixel image image image

</p> </details>

Why taichi?

Current status

The repo is now tested with the dataset provided by the official implementation. For the truck dataset, The repo is able to achieve a bit higher PSNR than the official implementation with only 1/5 to 1/4 number of points. However, the training/inference speed is still slower than the official implementation.

The results for the official implementation and this implementation are tested on the same dataset. I notice that the result from official implementation is slightly different from their paper, the reason may be the difference in testing resolution.

DatasetsourcePSNRSSIM#points
Truck(7k)paper23.510.840-
Truck(7k)offcial implementation23.22-1.73e6
Truck(7k)this implementation23.7623596191406250.835700511932373~2.3e5
Truck(30k)paper25.1870.879-
Truck(30k)offcial implementation24.88-2.1e6
Truck(30k)this implementation25.214639663696290.8645088076591492428687.0

Truck(30k)(recent best result):

train:iterationtrain:l1losstrain:losstrain:num_valid_pointstrain:psnrtrain:ssimtrain:ssimlossval:lossval:psnrval:ssim
30000.00.027847388759255410.04742341861128807428687.025.6621379852294920.87427246570587160.125727534294128420.0536919981241226225.214639663696290.8645088076591492

Installation

  1. Prepare an environment contains pytorch and torchvision
  2. clone the repo and cd into the directory.
  3. run the following command
pip install -r requirements.txt
pip install -e .

All dependencies can be installed by pip. pytorch/tochvision can be installed by conda. The code is tested on Ubuntu 20.04.2 LTS with python 3.10.10. The hardware is RTX 3090 and CUDA 12.1. The code is not tested on other platforms, but it should work on other platforms with minor modifications.

Dataset

The algorithm requires point cloud for whole scene, camera parameters, and ground truth image. The point cloud is stored in parquet format. The camera parameters and ground truth image are stored in json format. The running config is stored in yaml format. A script to build dataset from colmap output is provided. It is also possible to build dataset from raw data.

Train on Tank and temple Truck scene

<details><summary>CLICK ME</summary> <p> **Disclaimer**: users are required to get permission from the original dataset provider. Any usage of the data must obey the license of the dataset owner.

The truck scene in tank and temple dataset is the major dataset used to develop this repo. We use a downsampled version of images in most experiments. The camera poses and the sparse point cloud can be easily generated by colmap. The preprocessed image, pregenerated camera pose and point cloud for truck scene can be downloaded from this link.

Please download the images into a folder named image and put it under the root directory of this repo. The camera poses and sparse point cloud should be put under data/tat_truck_every_8_test. The folder structure should be like this:

├── data
│   ├── tat_truck_every_8_test
│   │   ├── train.json
│   │   ├── val.json
│   │   ├── point_cloud.parquet
├── image
│   ├── 000000.png
│   ├── 000001.png

the config file config/tat_truck_every_8_test.yaml is provided. The config file is used to specify the dataset path, the training parameters, and the network parameters. The config file is self-explanatory. The training can be started by running

python gaussian_point_train.py --train_config config/tat_truck_every_8_test.yaml
</p> </details>

Train on Example Object(boot)

<details><summary>CLICK ME</summary> <p>

It is actually one random free mesh from Internet, I believe it is free to use. BlenderNerf is used to generate the dataset. The preprocessed image, pregenerated camera pose and point cloud for boot scene can be downloaded from this link. Please download the images into a folder named image and put it under the root directory of this repo. The camera poses and sparse point cloud should be put under data/boots_super_sparse. The folder structure should be like this:

├── data
│   ├── boots_super_sparse
│   │   ├── boots_train.json
│   │   ├── boots_val.json
│   │   ├── point_cloud.parquet
├── image
│   ├── images_train
│   │   ├── COS_Camera.001.png
│   │   ├── COS_Camera.002.png
|   |   ├── ...

Note that because the image in this dataset has a higher resolution(1920x1080), training on it is actually slower than training on the truck scene.

</p> </details>

Train on dataset generated by colmap

<details><summary>CLICK ME</summary> <p> </p> </details>

Train on dataset with Instant-NGP format with extra mesh

<details><summary>CLICK ME</summary> <p>
python tools/prepare_InstantNGP_with_mesh.py \
    --transforms_train {path to train transform file} \
    --transforms_test {path to val transform file, if not provided, val will be sampled from train} \
    --mesh_path {path to mesh file} \
    --mesh_sample_points {number of points to sample on the mesh} \
    --val_sample {if sample val from train, sample by every n frames} \
    --image_path_prefix {path prefix to the image, usually the path to the folder containing the image folder} \
    --output_path {path to output folder}
python gaussian_point_train.py --train_config {path to config yaml}
</p> </details>

Train on dataset generated by BlenderNerf

<details><summary>CLICK ME</summary> <p>

BlenderNerf is a Blender Plugin to generate dataset for NeRF. The dataset generated by BlenderNerf can be the Instant-NGP format, and we can use the script to convert it into the required format. And the mesh can be easily exported from Blender. To generate the dataset:

python tools/prepare_InstantNGP_with_mesh.py \
    --transforms_train {path to transform_train.json} \
    --mesh_path {path to stl file} \
    --mesh_sample_points {number of points to sample on the mesh, default to be 500} \
    --val_sample {if sample val from train, sample by every n frames, default to be 8} \
    --image_path_prefix {absolute path of the directory contain the train dir} \
    --output_path {any path you want}
python gaussian_point_train.py --train_config {path to config yaml}
</p> </details>

Train on dataset generated by other methods

<details><summary>CLICK ME</summary> <p>

see this file about how to prepare the dataset.

</p> </details>

Run

python gaussian_point_train.py --train_config {path to config file}

The training process works in the following way:

stateDiagram-v2
    state WeightToTrain {
        sparsePointCloud
        pointCloudExtraFeatures
    }
    WeightToTrain --> Rasterizer: input
    cameraPose --> Rasterizer: input
    Rasterizer --> Loss: rasterized image
    ImageFromMultiViews --> Loss
    Loss --> Rasterizer: gradient
    Rasterizer --> WeightToTrain: gradient

The result is visualized in tensorboard. The tensorboard log is stored in the output directory specified in the config file. The trained point cloud with feature is also stored as parquet and the output directory is specified in the config file.

Run on colab (to take advantage of google provided GPU accelerators)

You can find the related notebook here: /tools/run_3d_gaussian_splatting_on_colab.ipynb

  1. Set the hardware accelerator in colab: "Runtime->Change Runtime Type->Hardware accelerator->select GPU->select T4"
  2. Upload this repo to corresponding folder in your google drive.
  3. Mount your google drive to your notebook (see notebook).
  4. Install condacolab (see notebook).
  5. Install requirement.txt with pip (see notebook).
  6. Install pytorch, torchvision, pytorch-cuda etc. with conda (see notebook).
  7. Prepare the dataset as instructed in https://github.com/wanmeihuali/taichi_3d_gaussian_splatting#dataset
  8. Run the trainer with correct config (see notebook).
  9. Check out the training process through tensorboard (see notebook).

Visualization

A simple visualizer is provided. The visualizer is implemented by Taichi GUI which limited the FPS to 60(If anyone knows how to change this limitation please ping me). The visualizer takes one or multiple parquet results. Example parquets can be downloaded here.

python3 visualizer --parquet_path_list <parquet_path_0> <parquet_path_1> ...

The visualizer merges multiple point clouds and displays them in the same scene.

How to contribute/Use CI to train on cloud

I've enabled CI and cloud-based training now. The function is not very stable yet. It enables anyone to contribute to this repo even if you don't have a GPU. Generally, the workflow is:

  1. For any algorithm improvement, please create a new branch and make a pull request.
  2. Please @wanmeihuali in the pull request, and I will check the code and add a label need_experiment or need_experiment_garden or need_experiment_tat_truck to the pull request.
  3. The CI will automatically build the docker image and upload it to AWS ECR. Then the cloud-based training will be triggered. The training result will be uploaded to the pull request as a comment, e.g. this PR. The dataset is generated by the default config of colmap. The training is on g4dn.xlarge Spot Instance(NVIDIA T4, a weaker GPU than 3090/A6000), the training usually takes 2-3 hours.
  4. Now the best training result in README.md is manually updated. I will try to automate this process in the future.

The current implementation is based on my understanding of the paper, and it will have some difference from the paper/official implementation(they plan to release the code in the July). As a personal project, the parameters are not tuned well. I will try to improve performance in the future. Feel free to open an issue if you have any questions, and PRs are welcome, especially for any performance improvement.

TODO

Algorithm part

Engineering part